Skip to content

Project: Dictionaries

Ambreen H edited this page Jun 14, 2020 · 12 revisions

Name

DistributedDictionary

Why needed

Because AMI uses a large n umber of dictionaries which may be developed independently and globally. AMIDictionaries are now spread over many sites, different degrees of compliance, and different location types (in-jar resources, distributed $DICTIONARY, local files, web sites). It's fun and a mess, just as the 1994 web was!

Similar/previous work

  • amidict org.contentmine.ami.tools.dictionary. Primary tools for creating, editing, translating, displaying, merging.

Proposed work

Definitions:

  • define the syntax
  • namespaces?

Tools

  • amidict-based: create, display, search , translate (the last is mainly json<->xml)
  • ami search supports distributed dictionaries but needs testing.

Developers

  • Peter Murray-Rust
  • ???

Testing

  • Richard Light
  • Emanuel Faria

Project page

???

Current state

Main use is in-jar resources. The distributed syntax works in ami search but is probably out-of-date. The dictionaries are not all Wikidata-ified.

Priority Dictionaries

  • country (Ambreen)
  • disease (Priya)
  • virus (vertebrate) (Kareena)
  • drugs (Rajan)
  • funders (PMR)

Running ami search for the "country dictionary"

Tester: Ambreen Hamadani

ami search tool was used to test the country dictionary

  1. getpapers was used to create a directory of 1000 papers (including full texts wherever available) getpapers -q "viral epidemics" -o countr_dict -f v_epid/log.txt -x -p -k 1000

  2. This directory was used to run ami search using country dictionary ami -p countr_dict search --dictionary country

  3. After a successful run, HTML Documents were created that classified the papers on the basis of the _country _while citing the frequency of each country. eg:

ISSUES:

  • ami search doesn't work directly unless the directory (cProject Directory) is specified before the search --dictionary eg The command ami search --dictionary country -p countr_dict1 throws the following error
================================
-v to see generic values

Specific values (AMISearchTool)
================================
created COMMAND: word(frequencies)xpath:@count>20~w.stopwords:pmcstop.txt_stopwords.txt search(country) search(-p) search(countr_dict1)
0    [main] DEBUG org.contentmine.ami.tools.AbstractAMISearchTool  - old style search command); to be changed
0 [main] DEBUG org.contentmine.ami.tools.AbstractAMISearchTool  - old style search command); to be changed
>ERROR: requires cProject

The correct command, in this case, is: ami -p countr_dict1 search --dictionary country

Clone this wiki locally