-
Notifications
You must be signed in to change notification settings - Fork 17
Maintaining and Updating Dictionaries
- to collect together concepts we are interested in under a single label (e.g. country)
- to provide an for each concept
- to provide search terms for each concept to locate it in the documents we search
- to link the concept to the world's knowledge graph.
- to help human readers understand the concepts.
- to provide a record of provenance and maintenance
All the words that can potentially be present in any research paper must be well available within our country dictionary
or better "all the words describing countries where viral epidemics have been reported/discussed". That can be hard ("Himalayan", "North Atlantic", "Sub-Saharan", etc.) But generally, academic papers will mention one or more countries specifically. @Emanuel Faria has done this for plants (where do essential oils come from?"). For that a country ("india") is too broad - we might want "Goa", or "Rajasthan", WE may have to be more specific "Wuhan" rather than "China". But for the moment lets work with countries.
It also has to be ensured that the country names that appear in the dictionary are really recognized countries (for instance, not ancient empires). It must also contain, in my opinion, the following:
- All the synonyms of the country: synonyms. Yes. "England", "Scotland", "Britain", "United Kingdom" are all widely used.
- All the common abbreviations: Yes. UK, GB, NI, for example. Abbreviations often cause ambiguity.
- Maybe even translations in other important world languages: Translations. Absolutely. If we are going to explore Hindi we will need a
term.hi
attribute. Wikidata has these if they are the titles of Wikipedia pages - The current dictionary has empty entity-tags for Wikipedia as well as wikidata which must also be present for redirection to the source pages: Yes. The tags were autogenerated to show they should be filled by hand.
The amidict software is, in principle, able to find Wikidata and Wikipedia links. But these are often ambiguous. In cases like country I expect it will be the leading one found. Manual checking is always required. This is an excellent thing for incoming INYAS to help with.