-
Notifications
You must be signed in to change notification settings - Fork 0
ImportingVocabularies
HIVE is able to import any vocabulary from a RDF/SKOS file. If the vocabulary is in another format, it must be converted into SKOS before importing.
In order to import a new vocabulary you must:
- Create a configuration file with the paths to the files and indexes that will be generated by the HIVE import tools.
- Run the AdminVocabularies class to perform the import
Each vocabulary has its own configuration file, with the following format:
#Vocabulary data
name = NBII
longName = CSA/NBII Biocomplexity Thesaurus
uri = http://thesaurus.nbii.gov
rdf_file = /usr/local/hive/hive-data/nbii/nbii3.rdf
#Sesame Store
store = /usr/local/hive/hive-data/nbii/store
#Lucene Inverted Index
index = /usr/local/hive/hive-data/nbii/index
#Autocomplete index
autocomplete = /usr/local/hive/hive-data/nbii/autocomplete
#H2 index
h2 = /usr/local/hive/hive-data/nbii/nbiiH2
#Dummy tagger data files
lingpipe_model = /usr/local/hive/hive-data/lingpipe/postagger/models/medtagModel
#KEA and Maui data files
stopwords = /usr/local/hive/hive-data/nbii/KEA/data/stopwords/stopwords_en.txt
kea_training_set = /usr/local/hive/hive-data/nbii/KEA/train
kea_test_set = /usr/local/hive/hive-data/nbii/KEA/test
kea_model = /usr/local/hive/hive-data/nbii/KEA/nbii
maui_model = /usr/local/hive/hive-data/nbii/KEA/maui
Place the configuration file in the same directory as the "hive.properties" file. The "hive.properties" file is used by SKOSServer identify which vocabularies will be opened.
The HIVE configuration directory may look like:
conf/
agrovoc.properties
hive.properties
lcsh.properties
mesh.properties
nbii.properties
tgn.properties
Before running AdminVocabularies, make sure Tomcat is not running. A single process can access the HIVE index files at a time.
AdminVocabularies takes these parameters:
- Path to configuration directory
- Name of the vocabulary
- Activate training option for KEA algorithm (optional, If you don't train your system, you can not use automatic indexing classes)
For example (with training):
java -Djava.ext.dirs=<path to HIVE lib dir> edu.unc.ils.mrc.hive.admin.AdminVocabularies -c <path to directory with hive.properties> -v <vocabulary name> [-a | -sldktmx]
Flags:
-c <path> Path to directory that contains hive.properties
-v <name> Name of vocabulary to be initialized (e.g., agrovoc)
-s Initialize Sesame index
-l Initialize Lucene index
-d Initialize H2 database
-k Initialize KEA database
-t Train KEA
-m Train Maui
-x Initialize autocomplete
-a Initialize everything (equivalent of -sldktmxa)
Once the vocabulary has been loaded, you may start Tomcat and test to make sure the vocabulary is working properly.
AdminVocabularies creates the following directories:
- H2 database containing administrative tables for the HIVE service. If the -k flag is specified, tables are also created to support the KEA++ indexing algorithm.
- Lucene inverted index for searching. HIVE uses a document-centric approach to representing concepts in the inverted index. Each concept is represented as a document with multiple fields (e.g., preferred term, alternate terms, scope notes, etc).
- Sesame database to store SKOS/RDF. HIVE uses a NativeStore, so vocabularies will be stored on the file system.
- Lucene autocomplete index (if the -x flag is specified)
- KEA++ (-t) and Maui (-m) statistical models used for automatic indexing.
All indexes and databases can be stored wherever you need in your file system. The location of each database and index is defined in the properties file for the vocabulary in the conf directory.