-
Notifications
You must be signed in to change notification settings - Fork 0
VocabularyConversion
This document describes the process used to convert a vocabulary to SKOS RDF/XML for import into HIVE.
SKOS versions of LCSH can be downloaded directly from the Library of Congress. No conversion is required.
-
http://id.loc.gov/download/
- Select "LCSH RDF/XML"
SKOS versions of AGROVOC can be downloaded directly from the FAO:
The NBII Biocomplexity Thesaurus is available as a SKOS web-service. The NBII data is also available for direct download in CSV and XML format. XML is currently used by HIVE for SKOS conversion.
-
http://my.nbii.gov
- Directory > Browse directory
- NBII Thesaurus
- repgen.csv (3/6/2009)
- repgen.xml (3/6/2009)
- Latest update: 3/6/2009 as of 1/26/2011
- NBII Thesaurus
- Directory > Browse directory
- Open SKOSCoverter project
- Edit utils.myreaders.xml.NBIIXMLReader
- In main(), specify path to repgen.xml and output directory for nbii.rdf
- Run NBIIXMLReader
- Edit utils.myreaders.xml.NBIIXMLReader
- Note: NBII
Integrated Taxinomic Information System (ITIS) is available in a variety of formats. MySQL is currently used by HIVE for SKOS conversion.
- Requires MySQL
- Download ITIS (http://www.itis.gov/downloads/itisMySQL121610_v3.TAR.gz)
- Extract archive
- Follow instructions in README.txt to import ITIS data
- Creates database ITIS
- Open ITIS2SKOS project
- Modify SKOSGenerator:
- Specify itis.rdf output directory
- Specify MySQL connection information
edu.unc.ils.mrc.hive.converter.itis.ITISConverter
TGN is available in relational and XML formats. Access to source TGN vocabularies requires licensing from the Getty. After you receive your TGN access information
- Extract tgn_rel_10_utf8_zip
- Open TGN2SKOS
- Edit TGNReader, update path to source directory and output path
edu.unc.ils.mrc.hive.converter.tgn.TGNReader
SKOS versions of LTER can be downloaded directly subversion:
Note: The SKOS version of MeSH available from http://thesauri.cs.vu.nl/eswc06 is not compatible with HIVE. A new converter has been written based on the MeSH XML source files.
MeSH is available for download in XML format. Access to MeSH vocabularies requires submitting an agreement form.
- Download desc2011.xml from http://www.nlm.nih.gov/mesh/filelist.html
java -Xmx1024m edu.unc.ils.mrc.hive.converter.mesh.MeshConverter <path_to_desc2001_xml> <path_to_output_RDF>
- Conversion requires ~1GB Ram. Specify java options -Xmx1024m