-
Notifications
You must be signed in to change notification settings - Fork 19
summary of dictionary plantcompound with minicorpus
To create this Dictionary using SPARQL query I collected wikidata IDs from the existing dictionary of plant_compound. -which is A dictionary of 2114 constituent chemical compounds extracted from Essential Oils converted from essoldb1.0.
-Dr. Gitanjali Yadav, Ph.D., Computational Biology Laboratory, NIPGR (National Institute of Plant Genome Research), Lecturer, Dept. of Plant Sciences, University of Cambridge and Ambarish Kumar are the respective authors for the dictionary.
-Clyde Davies, https://www.chem4word.co.uk/, Peter Murray-Rust, Reader Emeritus in Molecular Informatics, Unilever Centre, Dept. Of Chemistry University of Cambridge, were the contributor and the last update was by Ambarish Kumar.
-https://github.com/petermr/CEVOpen/blob/master/dictionary/eoCompound/work/eoCompound.xml (old dictionary link)
- Go to https://www.wikidata.org/wiki/Wikidata:Main_Page and click on 'Query Service' at the left column. This will redirect you to Wikidata Query Service page where you can create your SPARQL query.
- Run SPARQL query using following command:
SELECT ?item ?itemLabel ?itemAltLabel ?itemDescription ?wikipedia ?chemical_formula (GROUP_CONCAT(DISTINCT ?chemical_structure;separator="/") as ?t) WHERE {
VALUES ?item {
wd:Q151058 wd:Q151145 wd:Q408216 wd:Q3077502 wd:Q3133331 wd:Q3343334 wd:Q3348776 wd:Q3348789 wd:Q3374902 wd:Q4403492 wd:Q7320168 wd:Q9579862 wd:Q10354581 wd:Q10382439 wd:Q15090829 wd:Q15726084 wd:Q18456293 wd:Q21547157 wd:Q24764359 wd:Q27067434 wd:Q27093314 wd:Q27102270 wd:Q27102310 wd:Q27102309 wd:Q27104017 wd:Q27104582 wd:Q27104968 wd:Q27105216 wd:Q27105270 wd:Q27108568 wd:Q27108569 wd:Q27109437 wd:Q27109809 wd:Q27114893 wd:Q27115785 wd:Q27116059 wd:Q27116315 wd:Q27117882 wd:Q27121549 wd:Q27126497 wd:Q27132749 wd:Q27133173 wd:Q27133237 wd:Q27133440 wd:Q27134539 wd:Q27134973 wd:Q27136638 wd:Q27137382 wd:Q27137527 wd:Q27138027 wd:Q27138815 wd:Q27140449 wd:Q27145118 wd:Q27147832 wd:Q27149801 wd:Q27151348 wd:Q27155086 wd:Q27155315 wd:Q27156713 wd:Q27157517 wd:Q27157612 wd:Q27161158 wd:Q27161384 wd:Q27161428 wd:Q27161449 wd:Q27161582 wd:Q27161914 wd:Q27161916 wd:Q27162039 wd:Q27162243 wd:Q27163088 wd:Q27216226 wd:Q27251997 wd:Q27252089 wd:Q27259082 wd:Q27262779 wd:Q27264122 wd:Q27265613 wd:Q27270628 wd:Q27270652 wd:Q27271578 wd:Q27274890 wd:Q27277892 wd:Q27278976 wd:Q27280549 wd:Q27283468 wd:Q27283713 wd:Q27284461 wd:Q27289692 wd:Q27289978 wd:Q27290204 wd:Q75850 wd:Q150440 wd:Q150681 wd:Q150694 wd:Q150788 wd:Q150988 wd:Q15779 wd:Q28917 wd:Q151038 wd:Q158544 wd:Q161667 wd:Q161686 wd:Q173670 wd:Q179724 wd:Q418951 wd:Q418985 wd:Q419330 wd:Q419484 wd:Q419645 wd:Q419857 wd:Q420652 wd:Q421151 wd:Q421278 wd:Q422658 wd:Q424284 wd:Q424898 wd:Q424919 wd:Q425988 wd:Q612199 wd:Q663909 wd:Q1064625 wd:Q1323947 wd:Q1376658 wd:Q2041498 wd:Q2806593 wd:Q2988108 wd:Q3135041 wd:Q3304162 wd:Q3314420 wd:Q4117486 wd:Q4338679 wd:Q5254530 wd:Q6948297 wd:Q7076690 wd:Q7107372 wd:Q7250465 wd:Q7434119 wd:Q7577248 wd:Q7881242 wd:Q7883008 wd:Q10861391 wd:Q15296722 wd:Q15391928 wd:Q15634089 wd:Q16081199 wd:Q18208345 wd:Q21099594 wd:Q181003 wd:Q193572 wd:Q210385 wd:Q278332 wd:Q289496 wd:Q369777 wd:Q376994 wd:Q388740 wd:Q390681 wd:Q407560 wd:Q407669 wd:Q407936 wd:Q408384 wd:Q408883 wd:Q409799 wd:Q410772 wd:Q411500 wd:Q412424 wd:Q413816 wd:Q414057 wd:Q414257 wd:Q414395 wd:Q415446 wd:Q416114 wd:Q416222 wd:Q416673 wd:Q417896 wd:Q418080 wd:Q27146999 wd:Q27279013 wd:Q27280014 wd:Q33495 wd:Q27894190 wd:Q27896677 wd:Q27896794 wd:Q57513843 wd:Q66499305 wd:Q15632727wd:Q130336 wd:Q27290409 wd:Q27291190 wd:Q27291532 wd:Q27291753 wd:Q27292099 wd:Q27295951 wd:Q414189 wd:Q416222 wd:Q417328 wd:Q418104 wd:Q418672 wd:Q419164 wd:Q420648 wd:Q420698 wd:Q420701 wd:Q420894 wd:Q422597 wd:Q423357 wd:Q423956 wd:Q425161 wd:Q425827 wd:Q902204 wd:Q977405 wd:Q1193173 wd:Q2212471 wd:Q2631777 wd:Q2740687 wd:Q2823832 wd:Q3026266 wd:Q3234708 wd:Q3408685 wd:Q3743828 wd:Q4381113 wd:Q4545790 wd:Q5275146 wd:Q5613321 wd:Q6085540 wd:Q6528372 wd:Q6673936 wd:Q6673936 wd:Q6784821 wd:Q6813065 wd:Q6817554 wd:Q6817577 wd:Q6823933 wd:Q12831520 wd:Q15410959 wd:Q15410972 wd:Q151016 wd:Q151028 wd:Q423975 wd:Q425010 wd:Q1368869 wd:Q2103922 wd:Q2928816 wd:Q3407511 wd:Q4567614 wd:Q11066025 wd:Q15726046 wd:Q15726045 wd:Q15726084 wd:Q15726184 wd:Q16676086 wd:Q18456604 wd:Q20054528 wd:Q21024279 wd:Q22079607 wd:Q24514387 wd:Q24735110 wd:Q25383513 wd:Q27068156 wd:Q27098285 wd:Q27103186 wd:Q27106868 wd:Q27107347 wd:Q27116059 wd:Q27117020 wd:Q27121544 wd:Q27131974 wd:Q27135902 wd:Q27136640 wd:Q27138171 wd:Q27145004 wd:Q27146923 wd:Q27149580 wd:Q27149863 wd:Q27149867 wd:Q27149875 wd:Q27149877 wd:Q27155083 wd:Q27155416 wd:Q27157517 wd:Q27158231 wd:Q27158424 wd:Q27159285 wd:Q27159666 wd:Q27159671 wd:Q27160509 wd:Q27160824 wd:Q27161264 wd:Q27161414 wd:Q27161909 wd:Q27162017 wd:Q27162059 wd:Q27162104 wd:Q27162300 wd:Q27231522 wd:Q27236554 wd:Q27236791 wd:Q27252751 wd:Q27253032 wd:Q27254138 wd:Q27256022 wd:Q27257403 wd:Q27258072 wd:Q27258673 wd:Q27258900 wd:Q27261954 wd:Q27261969 wd:Q27262697 wd:Q27266306 wd:Q27266852 wd:Q27268172 wd:Q27268318 wd:Q27269023 wd:Q27269516 wd:Q27270907 wd:Q27272448 wd:Q27273075 wd:Q27273328 wd:Q27273646 wd:Q27273814 wd:Q27274360 wd:Q27275815 wd:Q27276901 wd:Q27277902 wd:Q27277945 wd:Q27278826 wd:Q27279821 wd:Q27280248 wd:Q27280641 wd:Q27281343 wd:Q27283565 wd:Q27284204 wd:Q27284213 wd:Q229987 wd:Q283917 wd:Q297592 wd:Q312240 wd:Q27289660 wd:Q16392 wd:Q33103 wd:Q409309 wd:Q410836 wd:Q412979 wd:Q412986 wd:Q413192 wd:Q79739 wd:Q150843 wd:Q151082 wd:Q162259 wd:Q407418 wd:Q407426 wd:Q408120 wd:Q19291552 wd:Q26841319 wd:Q27143856 wd:Q27287011 wd:Q27287738 wd:Q27288059 wd:Q27293572 wd:Q27294925 wd:Q55958569 wd:Q56459843 wd:Q27285778 wd:Q27286284 wd:Q150968 wd:Q151123 wd:Q408209 wd:Q418813 wd:Q529438 wd:Q1225152 wd:Q1227317 wd:Q1371466 wd:Q2182744 wd:Q2726127 wd:Q3135039 wd:Q4083784 wd:Q4465009 wd:Q4686651 wd:Q9796963 wd:Q10261807 wd:Q11595687 wd:Q17992508 wd:Q18023029 wd:Q18023029 wd:Q21099100 wd:Q23057921 wd:Q27067421 wd:Q27089406 wd:Q27089413 wd:Q27102766 wd:Q27102773 wd:Q27103073 wd:Q27103285 wd:Q27105152 wd:Q27105270 wd:Q27106340 wd:Q27106357 wd:Q27106480 wd:Q27108062 wd:Q27108619 wd:Q27108622 wd:Q27108625 wd:Q27108643 wd:Q27108640 wd:Q27109826 wd:Q27110141 wd:Q27115765 wd:Q27116653 wd:Q27116872 wd:Q27119785 wd:Q27120590 wd:Q27121532 wd:Q27121861 wd:Q27123468 wd:Q27123469 wd:Q27124011 wd:Q27126489 wd:Q27130849 wd:Q27132237 wd:Q27132629 wd:Q27133237 wd:Q27137335 wd:Q27139701 wd:Q27146459 wd:Q27149763 wd:Q27149760 wd:Q27149840 wd:Q27149841 wd:Q27149845 wd:Q27149851 wd:Q27149854 wd:Q27149853 wd:Q27154913 wd:Q27155094 wd:Q27155098 wd:Q27155122 wd:Q27158118 wd:Q27158322 wd:Q27158324 wd:Q27158424 wd:Q27159510 wd:Q27159575 wd:Q27159719 wd:Q27159827 wd:Q27159863 wd:Q27160891 wd:Q27160892 wd:Q2270 wd:Q47118 wd:Q27887610 wd:Q52353 wd:Q111812 wd:Q118040 wd:Q144362 wd:Q150744 wd:Q191700 wd:Q193213 wd:Q150925 wd:Q161620 wd:Q179896 wd:Q190067 wd:Q410915 wd:Q410949 wd:Q412265 wd:Q412567 wd:Q422590 wd:Q422613 wd:Q422917 wd:Q423133 wd:Q423282 wd:Q424223 wd:Q5404454 wd:Q9197460 wd:Q15391928 wd:Q15410911 wd:Q15410914 wd:Q15410912 wd:Q3050158 wd:Q3324838 wd:Q3348790 wd:Q4566861 wd:Q4783532 wd:Q5045553 wd:Q5114549 wd:Q5198713 wd:Q5254530 wd:Q5277321 wd:Q5287808 wd:Q413755 wd:Q414779 wd:Q415359 wd:Q415612 wd:Q416775 wd:Q416800 wd:Q416929 wd:Q419096 wd:Q419495 wd:Q419495 wd:Q419495 wd:Q419495 wd:Q419513 wd:Q419765 wd:Q419800 wd:Q419952 wd:Q421614 wd:Q421640 wd:Q422152 wd:Q424577 wd:Q726875 wd:Q780165 wd:Q903525 wd:Q905406 wd:Q909088 wd:Q1158234 wd:Q2212471 wd:Q2504388 wd:Q2692437 wd:Q2756479 wd:Q2815995 wd:Q2849857 wd:Q2920205 wd:Q2961728 wd:Q3033497 wd:Q211433 wd:Q225543 wd:Q284072 wd:Q312244 wd:Q372524 wd:Q409178 wd:Q409482 wd:Q409564 wd:Q409608 wd:Q410089 wd:Q410107 wd:Q410603 wd:Q18210424 wd:Q18341725 wd:Q18352160 wd:Q18354123 wd:Q19903910 wd:Q21071561 wd:Q27159530 wd:Q27270106 wd:Q27160941 wd:Q27161059 wd:Q27161796 wd:Q27161914 wd:Q27161913 wd:Q27161959 wd:Q27162104 wd:Q27162313 wd:Q27231417 wd:Q27236329 wd:Q27247611 wd:Q27253858 wd:Q27254390 wd:Q27256073 wd:Q27256574 wd:Q27258735 wd:Q27260969 wd:Q27263085 wd:Q27263792 wd:Q27265496 wd:Q27268572 wd:Q27268716 wd:Q27268865 wd:Q27269581 wd:Q27269851 wd:Q27271386 wd:Q27272938 wd:Q27273395 wd:Q27273443 wd:Q27277780 wd:Q27277935 wd:Q27278243 wd:Q27278791 wd:Q27279950 wd:Q27282268 wd:Q27282747 wd:Q27284396 wd:Q27285775 wd:Q27288773 wd:Q27291819 wd:Q27294287 wd:Q27295787 wd:Q27457043 wd:Q27894518 wd:Q256502 wd:Q285640 wd:Q300850 wd:Q375112 wd:Q402607 wd:Q408094 wd:Q409554 wd:Q409853 wd:Q410405 wd:Q412429 wd:Q414754 wd:Q415103 wd:Q415128 wd:Q415519 wd:Q416114 wd:Q416972 wd:Q420652 wd:Q421486 wd:Q424998 wd:Q425668 wd:Q1052617 wd:Q1064625 wd:Q1287838 wd:Q1995108 wd:Q2024187 wd:Q2173897 wd:Q2189778 wd:Q2191936 wd:Q2793438 wd:Q2862455 wd:Q3209146 wd:Q3266675 wd:Q3270746 wd:Q3278289 wd:Q3278324 wd:Q3278329 wd:Q3491255 wd:Q4596913 wd:Q4634135 wd:Q4634146 wd:Q4634173 wd:Q27121796 wd:Q27146999 wd:Q209454 wd:Q223077 wd:Q223098 wd:Q2405051 wd:Q2416556 wd:Q2813826 wd:Q2816000 wd:Q2816676 wd:Q5652290 wd:Q9579862 wd:Q9593998 wd:Q10869280 wd:Q15298225 wd:Q15726063 wd:Q20968579 wd:Q21546974 wd:Q22138333 wd:Q22162855 wd:Q22668732 wd:Q22668733 wd:Q22830264 wd:Q24063466 wd:Q24716505 wd:Q24791131 wd:Q25933668 wd:Q26828654 wd:Q27089421 wd:Q27098188 wd:Q27102109 wd:Q27103091 wd:Q27104051 wd:Q27104073 wd:Q27105580 wd:Q27106808 wd:Q27108601 wd:Q27108606 wd:Q27108607 wd:Q27108605 wd:Q27108619 wd:Q27108620 wd:Q27110107 wd:Q27114062 wd:Q27117043 wd:Q27117204 wd:Q27121498 wd:Q27124011 wd:Q27126618 wd:Q27131275 wd:Q27132629 wd:Q27132748 wd:Q27134263 wd:Q27136198 wd:Q27137053 wd:Q27137093 wd:Q27138694 wd:Q27139427 wd:Q27147103 wd:Q27147100 wd:Q27147536 wd:Q27147537 wd:Q27155062 wd:Q27158870 wd:Q27159516 wd:Q27159532 wd:Q27159714 wd:Q27159757 wd:Q27160277 wd:Q27160306 wd:Q27160508 wd:Q27160551 wd:Q27160655 wd:Q27160790 wd:Q27160792 wd:Q27161228 wd:Q27161440 wd:Q27161470 wd:Q27162006 wd:Q27162125 wd:Q27231338 wd:Q27236683 wd:Q153 wd:Q83763 wd:Q174937 wd:Q209463 wd:Q221307 wd:Q223076 wd:Q223101 wd:Q223103 wd:Q223112 wd:Q225543 wd:Q229995 wd:Q241678 wd:Q47512 wd:Q49546 wd:Q61457 wd:Q4637201 wd:Q4734900 wd:Q7086489 wd:Q7107372 wd:Q7165332 wd:Q8083965 wd:Q18349104 wd:Q27282706 wd:Q27283339 wd:Q27284121 wd:Q27284213 wd:Q27284743 wd:Q27290087 wd:Q27290322 wd:Q27291021 wd:Q27291030 wd:Q27291532 wd:Q27460635 wd:Q27896700 wd:Q33143279 wd:Q49081089 wd:Q63396315 wd:Q63398108 wd:Q27251951 wd:Q27256728 wd:Q27259011 wd:Q27266333 wd:Q27270690 wd:Q27271541 wd:Q27272497 wd:Q27272774 wd:Q27275524 wd:Q27277127 wd:Q27278142 wd:Q27280605 wd:Q27281761 wd:Q27282559 wd:Q284072 wd:Q300850 wd:Q300852 wd:Q372291 wd:Q403037 wd:Q410836 wd:Q411073 wd:Q412366 wd:Q412403 wd:Q412429 wd:Q416114 wd:Q416126 wd:Q418164 wd:Q420698 wd:Q423439 wd:Q424957 wd:Q425512 wd:Q517266 wd:Q632384 wd:Q780165 wd:Q1051071 wd:Q1376658 wd:Q2020228 wd:Q2055416 wd:Q2080732 wd:Q2173897 wd:Q2493733 wd:Q2720011 wd:Q2813819 wd:Q2823832 wd:Q2920205 wd:Q3278308 wd:Q4596853 wd:Q4596879 wd:Q4596898 wd:Q5074300 wd:Q5404456 wd:Q10858039 wd:Q15046450 wd:Q15269711 wd:Q15391928 wd:Q15634116 wd:Q15634156 wd:Q15927659 wd:Q18347448 wd:Q19904086 wd:Q26840883 wd:Q27147084 wd:Q255564 wd:Q410888 wd:Q414225 wd:Q416036 wd:Q418077 wd:Q420449 wd:Q421838 wd:Q1368869 wd:Q2416556 wd:Q2419305 wd:Q2448755 wd:Q2716147 wd:Q2813818 wd:Q3374882 wd:Q10878628 wd:Q14584726 wd:Q15635537 wd:Q15687205 wd:Q17239258 wd:Q20054515 wd:Q20054528 wd:Q21546974 wd:Q25385016 wd:Q26777005 wd:Q26777012 wd:Q26841225 wd:Q27089385 wd:Q27103655 wd:Q27104033 wd:Q27104917 wd:Q27105118 wd:Q27105200 wd:Q27105262 wd:Q27108598 wd:Q27108619 wd:Q27109437 wd:Q27114873 wd:Q27115606 wd:Q27115961 wd:Q27116105 wd:Q27116653 wd:Q27116830 wd:Q27117072 wd:Q27121555 wd:Q27121594 wd:Q27121593 wd:Q27122111 wd:Q27132147 wd:Q27132867 wd:Q27133324 wd:Q27136726 wd:Q27137053 wd:Q27146459 wd:Q27147814 wd:Q27147831 wd:Q27157591 wd:Q27159230 wd:Q27159265 wd:Q27159615 wd:Q27160277 wd:Q27160351 wd:Q27160424 wd:Q27160456 wd:Q27160475 wd:Q27160941 wd:Q27161059 wd:Q158587 wd:Q161617 wd:Q161632 wd:Q161662 wd:Q161666 wd:Q161664 wd:Q4370 wd:Q14985 wd:Q27335 wd:Q41576 wd:Q204030 wd:Q204036 wd:Q204182 wd:Q209384 wd:Q209438 wd:Q278809 wd:Q161667 wd:Q161683 wd:Q164785 wd:Q183300 wd:Q76933 wd:Q77003 wd:Q151733 wd:Q151797 wd:Q27256927 wd:Q27258561 wd:Q27258744 wd:Q27262621 wd:Q27266517 wd:Q27268613 wd:Q27269343 wd:Q27271177 wd:Q27271549 wd:Q27272785 wd:Q27273802 wd:Q27274053 wd:Q27275326 wd:Q27276882 wd:Q27277546 wd:Q27278050 wd:Q27278231 wd:Q27280371 wd:Q27281923 wd:Q27282877 wd:Q27284396 wd:Q27286127 wd:Q27287030 wd:Q27287134 wd:Q27289047 wd:Q27289517 wd:Q27289647 wd:Q27291021 wd:Q27291250 wd:Q27291338 wd:Q27291930 wd:Q27291987 wd:Q27292273 wd:Q27294928 wd:Q27889971 wd:Q67879598 wd:Q27161382 wd:Q27161739 wd:Q27161905 wd:Q27161912 wd:Q27161918 wd:Q27161938 wd:Q27162003 wd:Q27162126 wd:Q27162125 wd:Q27162130 wd:Q27162280 wd:Q27162308 wd:Q27163057 wd:Q27231437 wd:Q27236295 wd:Q27236395 wd:Q27236759 wd:Q27237245 wd:Q27237256 wd:Q27251778 wd:Q27251815 wd:Q27252265 wd:Q27252803
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
?item rdfs:label ?itemLabel;
skos:altLabel ?itemAltLabel;
schema:description ?itemDescription.
}
OPTIONAL {
?wikipedia schema:about ?item;
schema:isPartOf <https://en.wikipedia.org/>.
}
OPTIONAL { ?item wdt:P117 ?chemical_structure. }
OPTIONAL { ?item wdt:P274 ?chemical_formula. }
}
GROUP BY ?item ?itemLabel ?itemAltLabel ?itemDescription ?wikipedia ?chemical_formula
ORDER BY ?item
- After getting results, Click on 'Link' and then "SPARQL endpoint' . SPARQL file will start downloading automatically.
- Using amidict for SPARQL mapping will be done. Command for the same:
amidict -vv --dictionary plantcompound --directory ins --input sparql create --informat wikisparqlxml --sparqlmap wikidataURL=item,wikipediaPage=wikipedia,name=itemLabel,term=itemLabel,Description=itemDescription,_p117_chemical_structure=t,_p274_chemical_formula=chemical_formula --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*)) --synonyms=itemAltLabel
- Upload on Github
- The dictionary contains WikidataID, wikidataURL, description, etc.
- Removed the duplicate terms that were present.
Find the dictionary plant_compound.xml here
-
State of dictionary:
- number of terms- 898
- number of terms with synonyms- 165
- percentage of entries with wikidataIDs- 100%
- percentage of entries with Wikipages- 100%
- summary of page count in non-EN languages- NA
- percentage of entries with images- 46.3%
- work still to be done on dictionary- synonyms dictionary is present in work folder, it is needed to be combined with the main dictionary
- user-facing documentation on the dictionary
-
Searching with dictionary:
- test corpus (corpora used)- essential oil and plant compound
- links to output of searches https://github.com/petermr/CEVOpen/tree/master/minicorpora/essoilcomp
- examples of search commands- shown below
- brief tutorial on how to search with your dictionary-
-
Use of dictionary
- demonstration that other colleagues can and have used your dictionary https://github.com/petermr/CEVOpen/wiki/Testing-dictionaries-against-corpus https://drive.google.com/file/d/1mNTwHEOjYG17DlJp3pyKyVk9z8bGr_fx/view?usp=sharing
- Development of compound dictionary to serve as a tool for searching and annotating scientific articles
- Testing of getpapers; a web scrapper for open-source scientific literature and using it for the creation of a corpus of plant compound and essential oils scientific papers
- Running a dictionary-based search within the created corpus and drawing relationships between plant compound and essential oils
- Creating minicorpus using
getpapers
andami
query. Installation of getpapers & ami
-
getpapers
is a simple, powerful tool for querying repositories of scholarly articles using a simple one-line command. - It collects all freely available research papers in full text and xml format to your local machine.
- The command getpapers will initiate the process and -q refers to the query which is to be searched. The query is entered in inverted commas as is done in "(plant compound) AND (essential oil)". The next element is -o which refers to output directory and the parameter that follows it in the name of the directory which is plant_compound in our case. Then, -x -p corresponds to xml and pdf files to be included in our search and -k 100 limits our search to 100 files only.
- getpapers used to create corpus of essential oils and plant compound named essoilcomp
General code syntax: getpapers -q <"project title "> -o <file name> -x<xml> -p<pdf> -k <number of papers requied>
Query code:
getpapers -q "(plant comound) AND (essential oil)" -o essoilcomp -x -p -k 100
This helps to build corpus of 92 articles with full text and xml file
- ami is a framework for gathering, searching, transforming scholarly publications, oriented towards STEMM (Science, technology, Engineering, Medicine, Maths).
Ami section which is used to section the research papers into the front, body, back ,floats and groups. Sectioning of downloaded files will create a tree structure for us which will help in exploring the content of the file. Sectioning done using section function of ami .Which runs on command prompt.
General code syntax: ami -p <cproject> section
Query code:
ami –p " essoilcomp " section
Ami search which search and analysis the terms in your project repository and gives the frequency is terms and the histogram of your corpus.
General code syntax: ami –p <cprooject><directory> search –dictionary <path>
Query code:
ami -p "essoilcomp" search --dictionary plant_compound.xml
Collected freely available papers from EUROPMC.
FIGURE 1: OUTPUT OF getpapers
Results of ami section. It sections the papers in the directory.
FIGURE 2: OUTPUT OF ami section
Results are in the form of table , histogram and in the each folder results.
FIGURE 4: OUTPUT OF AMI SEARCH IN TABLE WITH FREQUENCY
FIGURE 5: PLOT OF .SVG FILE