The Geological Survey of Queensland (GSQ) publishes vocabularies - a way to describe things and the relationship between things.
A vocabulary is a set of agreed terms:
- In GSQ, a vocabulary defines the terms used to describe and represent things in the domain of economic resources, geoscience and data management.
- Vocabularies align information within a business area or across systems.
- Vocabularies can be very complex (with thousands of terms) or very simple (describing one or two concepts only).
For more detailed information, please read Why Vocabularies? and more subjects in the Vocabularies Wiki.
Fig. 1: Vocabulary context diagram
- We use tools such as Vocbench or Excel to create the vocabulary using SKOS Simple Knowledge Organization System language structure. This structure provides a framework for describing and relating concepts in a common language, ensuring everyone prescisely understands a concept in the same way. See also the SKOS Primer for the basics.
- The native format for a vocabulary is a TTL (turtle) file. Essentially, a text file with a .ttl extension and written in a specific language style. This file contains Resource Description Framework (RDF triples). These 'triple' are similar to English language 'subject > predicate > object' statements.
- We use Github (where you are now) to store and manage versions of vocabulary TTL files. Github also provides workflow functionality to approve vocabularies. Read the Github getting started guide
- We import the TTL files into GraphDB to create a triple store. GraphDB lets us query the triples.
- VocPrez presents our vocabs on the web for people and computers to read. VocPrez pulls the triples from GraphDB to create a cache of the vocabularies.
- Other systems, such as GeoProperties and the the Data Portals, pull their values from VocPrez. This ensures that the attributes uses to describe a dataset comes from the controlled vocabulary.
Fig. 2: Vocabulary build and pull process
- Search for existing International, National, and Industry Standards. Use directly where possible, augment and adapt when needed, create new original vocabulary as a final option (see below for links to existing vocabularies).
- Create the vocabulary using the SKOS Simple Knowledge Organization System
- See also the SKOS Primer for the basics
- Use one of these tools:
- Vocbench to create the vocabulary.
- Excel templates - GSQ's Excel SKOS Vocabulary Builder.
- Text Editor, e.g. Visual Studio Code
- for VS Code, use the extension Language Support for RDF related language syntax for formatting support.
- Allocate a URI to the vocab
- we use
linked.data.gov.au
for all GSQ vocabs - Arrange for URI allocation via the Contacts below
- we use
- Export the vocabulary to a TTL file
- If using Vocbench, it is easier to export the TTL from the Build repository in GraphDB. Follow the instructions here.
- Validate the TTL file
- Use the online Skosify tool.
- This tests for SKOS conformance
- Tick the checkbox Keep skos:related relationships within the same hierarchy, leave the other checkboxes unticked. and/or, Use the online SKOS testing tool.
- This tests for SKOS conformance, missing language tags, valid label rules, and notation uniqueness
- The use the GSQ Vocab SHACL Shapes files
- Use the online Skosify tool.
- Import the TTL file into a development branch in Github. Name your branch dev-vocabularyName. See how-to instructions here.
- Submit a pull request to the vocabularies repository.
- Create a branch for your vocab named
review-vocabularyName
- add your vocab to that branch
- create a Pull Request from that
review-
branch tomaster
branch and nominate reviewers - Once 2+ reviews have passed (usually a data managmenet staff member and a science domain expert), the final reviewer will merge the
review-
branch intomaster
branch and delete thereview-
branch
- Publication of the vocab to production VocPrez will be automated from here onwards
- You should see the vocab in https://vocabs.gsq.digital/vocabulary/ within hours of approval and merger
- If testing of the vocab in VocPrez is required, the test instance will be used: https://test.vocabs.gsq.digital/vocabulary/
- During UAT the vocabularies rendered in the test systems will be derived from files presented in https://vocabs.uat.gsq.digital/vocabulary
The steps outlined above are shown in workflow form at the Vocabulary Review Workflow wiki page
- Geoscience Australia http://ldweb.ga.gov.au/def/voc/ga/
- CGI Vocabularies Register http://resource.geosciml.org/def/voc/
- EarthResourceML Vocabularies http://resource.geosciml.org/def/voc/ (scroll down)
- Research Vocabularies Australia https://vocabs.ands.org.au/
- Linked Open Vocabularies https://lov.linkeddata.es/dataset/lov
- ISO 19115 https://www.ngdc.noaa.gov/wiki/index.php/ISO_19115_and_19115-2_CodeList_Dictionaries
- Basel Register https://bartoc.org/
- British Geological Survey https://www.bgs.ac.uk/data/vocabularies/home.cfm
- INSPIRE Code Lists http://inspire.ec.europa.eu/codelist
- NERC Vocabulary Server http://vocab.nerc.ac.uk/collection/
- Best practice in formalizing a SKOS vocabulary https://confluence.csiro.au/public/VOCAB/vocabulary-services/publishing-vocabularies/best-practice-in-formalizing-a-skos-vocabulary
- vocabularies/ - all GSQ's vocabularies, in RDF (Turtle) text files
- shapes/ - SHACL graph shape files used to validate vocab files before publication
- scripts/ - Python scripts to dump/load a GraphDB instance with these vocab files
- templates/ - Excel and other tools to help with vocab creation
This code repository's content are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0), the deed of which is stored in this repository here: LICENSE.
Geoscience Information Team, Geological Survey of Queensland, Department of Resources, Brisbane, QLD, Australia, [email protected]