Skip to content

Maintaining and Updating Dictionaries

Ambreen H edited this page Jun 28, 2020 · 4 revisions

purpose of the dictionaries are:

  1. to collect together concepts we are interested in under a single label (e.g. country)
  2. to provide an for each concept
  3. to provide search terms for each concept to locate it in the documents we search
  4. to link the concept to the world's knowledge graph.
  5. to help human readers understand the concepts.
  6. to provide a record of provenance and maintenance

this may be understood by taking the country dictionary as an example:

All the words that can potentially be present in any research paper must be well available within our country dictionary or better "all the words describing countries where viral epidemics have been reported/discussed". That can be hard ("Himalayan", "North Atlantic", "Sub-Saharan", etc.) But generally, academic papers will mention one or more countries specifically. @Emanuel Faria has done this for plants (where do essential oils come from?"). For that a country ("india") is too broad - we might want "Goa", or "Rajasthan", WE may have to be more specific "Wuhan" rather than "China". But for the moment lets work with countries.

It also has to be ensured that the country names that appear in the dictionary are really recognized countries (for instance, not ancient empires). It must also contain, in my opinion, the following:

  1. All the synonyms of the country: synonyms. Yes. "England", "Scotland", "Britain", "United Kingdom" are all widely used.
  2. All the common abbreviations: Yes. UK, GB, NI, for example. Abbreviations often cause ambiguity.
  3. Maybe even translations in other important world languages: Translations. Absolutely. If we are going to explore Hindi we will need a term.hi attribute. Wikidata has these if they are the titles of Wikipedia pages
  4. The current dictionary has empty entity-tags for Wikipedia as well as wikidata which must also be present for redirection to the source pages: Yes. The tags were autogenerated to show they should be filled by hand.

should I update the existing country dictionary manually with all these values?

The amidict software is, in principle, able to find Wikidata and Wikipedia links. But these are often ambiguous. In cases like country I expect it will be the leading one found. Manual checking is always required. This is an excellent thing for incoming INYAS to help with.

Clone this wiki locally