Skip to content
petermr edited this page Aug 6, 2020 · 5 revisions

amidict

overview

amidict is a command for creating, modifying or analyzing dictionaries. To use the dictionaries in searching, use: ami ... search --dictionary <dict1> [<dict2>...]

amidict is a wrapper command that runs AMIDict.java . It directly calls main in org.contentmine.ami.tools.AMIDict, then picocli system of parsing commandlines, and then the appropriate subcommands through subclasses of AbstractAMITool. Although ami and amidict are distributed in the same program (ami3) and use the picocli commandline they are separate and may use different mnemonics.

amidict has several subcommands

There are several picocli Options required to control the input and output and not many defaults. For example to create a dictionary you probably need a directory (where to put it) and a dictionary (what to call the dictionary). Examples:

--dictionary

The name of the dictionary (not the filename, though it may be used to read or create the filename).

--directory

The directory (folder) where the dictionary/s are to be found (display, search, translate) or to be created (create).

There are several picocli Options required to control the input and output and not many defaults. We may try to streamline this later.

see: org.contentmine.ami.tools/ AMIDict.java and AbstractAMIDictTool.java

and subcommands in

org.contentmine.ami.tools.dictionary/

intended uses

Generally amidict will be called once to create a dictionary and then at irregular intervals to update it and annotate it. It does not directly interface with the ami system of chainable commands which uses the directories in ami ... search --dictionary ....

amidict picocli

Created 2020-07-09 by amidict --help

usage and options

Note. These options are needed for the subcommands

Usage: amidict [OPTIONS] COMMAND

`amidict` is a command suite for managing dictionary:


Parameters:
===========
      [@<filename>...]   One or more argument files containing options.
Options:
========
  -d, --dictionary=<dictionaryList>...
                         input or output dictionary name/s. for 'create' must be singular; when 'display' or
                           'translate', any number. Names should be lowercase, unique. [a-z][a-z0-9._]. Dots can be
                           used to structure dictionaries into directories. Dictionary names are relative to
                           'directory'. If <directory> is absent then dictionary names are absolute.

dictionary values (without suffix) is normally the way of referring to a dictionary.

      --directory=<directory>
                         top directory containing dictionary/s. Subdirectories will use structured names (NYI). Thus
                           dictionary 'animals' is found in '<directory>/animals.xml', while 'plants.parts' is found in
                           <directory>/plants/parts.xml. Required for relative dictionary names.

directory folder is often the place where all the users' dictionaries are located.

  -h, --help             Show this help message and exit.
  -V, --version          Print version information and exit.
General Options:
  -i, --input=FILE       Input filename, containing input for dictionary. its basename becomes the inputname

Mainly used by create

  -n, --inputname=PATH   User's basename for inputfiles (e.g. foo/bar/<basename>.txt). The default name for the
                           dictionary; required for 'terms`
  -L, --inputnamelist=PATH...
                         List of inputnames; will iterate over them, essentially compressing multiple commands into
                           one. Experimental.

These are normally used by create . Probably best to look at examples.

Logging Options:
  -v, --verbose          Specify multiple -v options to increase verbosity. For example, `-v -v -v` or `-vvv`. We map
                           ERROR or WARN -> 0 (i.e. always print), INFO -> 1(-v), DEBUG->2 (-vv)
      --log4j=(CLASS LEVEL)...
                         Customize logging configuration. Format: <classname> <level>; sets logging level of class, e.
                           g.
                          org.contentmine.ami.lookups.WikipediaDictionary INFO
Commands:
=========
  create     creates dictionaries from text, Wikimedia, etc..
  display    Displays AMI dictionaries. (Under Development)
  search     searches within dictionaries
  translate  translates dictionaries between formats

schema

There is no formal schema for the dictionary yet. The current syntax is exemplified by:

<dictionary title="disease4">
<desc>Created by SPARQL</desc>
<entry description="injury caused by a bite from a snake" name="snakebite" term="snakebite" wikidataAltLabel="snake bite, snake bites, snake envenomation, snake envenoming" wikidataURL="http://www.wikidata.org/entity/Q68854" wikipediaURL="https://en.wikipedia.org/wiki/Snakebite" wikidataID="Q68854">
<synonym>snake bite</synonym>
<synonym>snake bites</synonym>
<synonym>snake envenomation</synonym>
<synonym>snake envenoming</synonym>
</entry>
...
</dictionary>

dictionary

The root element.

attributes

  • @title The title MUST match the root of the filename (e.g. this would be in disease4.xml . Titles/roots MUST be lowercase alphanumeric. We are working on the best approach to namespacing. (In the past we suggested med.disease4, etc. but not sure how this maps onto directories.
  • xml:lang

child elements

  • desc (0..*) but we may mandate certain types in the future
  • entry (1..*)

desc

Description of the dictionary.

attributes

  • @date date of creation/update. ISO8601 format (yyyy-mm-dd)
  • @author author of this field
  • @type type of field
  • xml:lang

child elements

none

entry

A term record.

attributes

  • term MANDATORY. The term. The lead word or phrase describing the concept. This is case insensitive.
  • name The name of the term. This is often identical to the term, but may be a longer phrase (e.g. a legal name) which is more formal but less useful for searching.
  • description English language description of the term (e.g. from Wikidata description)
  • wikidataAltLabel a series of concatenated synonyms from Wikidata
  • wikidataID the "P" or "Q" identifier in Wikidata. MANDATORY if Wikidata is the primary source.
  • wikidataURL the URI in Wikidata. This is generally the URL at which Wikidata can be browsed (URI==URL).
  • wikipediaURL the URL of the English language Wikipedia page linked from Wikidata.
  • @date date of creation/update. ISO8601 format (yyyy-mm-dd)
  • @author author of this field

Languages

  • term.<lang> The term in language <lang>, e.g. term.hi
  • name.<lang> The name of the term in language <lang>, e.g. name.hi
  • description.<lang> The name of the term in language <lang>, e.g. name.hi
  • wikidataAltLabel.<lang> The name of the term in language <lang>, e.g. name.hi

(These may be better as children)

child elements

  • synonym (0..*)
Clone this wiki locally