-
Notifications
You must be signed in to change notification settings - Fork 5
amidict
amidict
is a command for creating, modifying or analyzing dictionaries. To use the dictionaries in searching,
use:
ami ... search --dictionary <dict1> [<dict2>...]
amidict
is a wrapper command that runs AMIDict.java
. It directly calls main
in
org.contentmine.ami.tools.AMIDict
, then picocli
system of parsing commandlines, and then the appropriate
subcommands through subclasses of AbstractAMITool
. Although ami
and amidict
are distributed in the same program (ami3
) and use the picocli
commandline they are separate and may use different mnemonics.
amidict
has several subcommands
- amidict create creates dictionaries from wordlists, Wikipedia, Wikidata etc.
- amidict display displays dictionaries and validates them
- amidict search searches dictionaries for terms
- amidict translate translates between formats (and later human languages using Wikidata)
There are several picocli
Options required to control the input and output and not many defaults. For example to
create a dictionary you probably need a directory
(where to put it) and a dictionary
(what to call the dictionary).
Examples:
The name of the dictionary (not the filename, though it may be used to read or create the filename).
The directory (folder) where the dictionary/s are to be found (display
, search
, translate
) or to be created (create
).
There are several picocli
Options required to control the input and output and not many defaults. We may try to streamline this later.
see: org.contentmine.ami.tools/
AMIDict.java
and AbstractAMIDictTool.java
and subcommands in
org.contentmine.ami.tools.dictionary/
Generally amidict
will be called once to create a dictionary and then at irregular intervals to update it and annotate it.
It does not directly interface with the ami
system of chainable commands which uses the directories in
ami ... search --dictionary ...
.
Created 2020-07-09 by amidict --help
Note. These options are needed for the subcommands
Usage: amidict [OPTIONS] COMMAND
`amidict` is a command suite for managing dictionary:
Parameters:
===========
[@<filename>...] One or more argument files containing options.
Options:
========
-d, --dictionary=<dictionaryList>...
input or output dictionary name/s. for 'create' must be singular; when 'display' or
'translate', any number. Names should be lowercase, unique. [a-z][a-z0-9._]. Dots can be
used to structure dictionaries into directories. Dictionary names are relative to
'directory'. If <directory> is absent then dictionary names are absolute.
dictionary
values (without suffix) is normally the way of referring to a dictionary.
--directory=<directory>
top directory containing dictionary/s. Subdirectories will use structured names (NYI). Thus
dictionary 'animals' is found in '<directory>/animals.xml', while 'plants.parts' is found in
<directory>/plants/parts.xml. Required for relative dictionary names.
directory
folder is often the place where all the users' dictionaries are located.
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
General Options:
-i, --input=FILE Input filename, containing input for dictionary. its basename becomes the inputname
Mainly used by create
-n, --inputname=PATH User's basename for inputfiles (e.g. foo/bar/<basename>.txt). The default name for the
dictionary; required for 'terms`
-L, --inputnamelist=PATH...
List of inputnames; will iterate over them, essentially compressing multiple commands into
one. Experimental.
These are normally used by create
. Probably best to look at examples.
Logging Options:
-v, --verbose Specify multiple -v options to increase verbosity. For example, `-v -v -v` or `-vvv`. We map
ERROR or WARN -> 0 (i.e. always print), INFO -> 1(-v), DEBUG->2 (-vv)
--log4j=(CLASS LEVEL)...
Customize logging configuration. Format: <classname> <level>; sets logging level of class, e.
g.
org.contentmine.ami.lookups.WikipediaDictionary INFO
Commands:
=========
create creates dictionaries from text, Wikimedia, etc..
display Displays AMI dictionaries. (Under Development)
search searches within dictionaries
translate translates dictionaries between formats
There is no formal schema for the dictionary yet. The current syntax is exemplified by:
<dictionary title="disease4">
<desc>Created by SPARQL</desc>
<entry description="injury caused by a bite from a snake" name="snakebite" term="snakebite" wikidataAltLabel="snake bite, snake bites, snake envenomation, snake envenoming" wikidataURL="http://www.wikidata.org/entity/Q68854" wikipediaURL="https://en.wikipedia.org/wiki/Snakebite" wikidataID="Q68854">
<synonym>snake bite</synonym>
<synonym>snake bites</synonym>
<synonym>snake envenomation</synonym>
<synonym>snake envenoming</synonym>
</entry>
...
</dictionary>
The root element.
-
@title
The title MUST match the root of the filename (e.g. this would be indisease4.xml
. Titles/roots MUST be lowercase alphanumeric. We are working on the best approach to namespacing. (In the past we suggestedmed.disease4
, etc. but not sure how this maps onto directories. xml:lang
-
desc
(0..*) but we may mandate certain types in the future -
entry
(1..*)
Description of the dictionary.
-
@date
date of creation/update. ISO8601 format (yyyy-mm-dd) -
@author
author of this field -
@type
type of field xml:lang
none
A term record.
-
term
MANDATORY. The term. The lead word or phrase describing the concept. This is case insensitive. -
name
The name of the term. This is often identical to the term, but may be a longer phrase (e.g. a legal name) which is more formal but less useful for searching. -
description
English language description of the term (e.g. from Wikidata description) -
wikidataAltLabel
a series of concatenated synonyms from Wikidata -
wikidataID
the "P" or "Q" identifier in Wikidata. MANDATORY if Wikidata is the primary source. -
wikidataURL
the URI in Wikidata. This is generally the URL at which Wikidata can be browsed (URI==URL). -
wikipediaURL
the URL of the English language Wikipedia page linked from Wikidata. -
@date
date of creation/update. ISO8601 format (yyyy-mm-dd) -
@author
author of this field
Languages
-
term.<lang>
The term in language<lang>
, e.g.term.hi
-
name.<lang>
The name of the term in language<lang>
, e.g.name.hi
-
description.<lang>
The name of the term in language<lang>
, e.g.name.hi
-
wikidataAltLabel.<lang>
The name of the term in language<lang>
, e.g.name.hi
(These may be better as children)
-
synonym
(0..*)