Scripts used to convert DataCite metadata and NWB metadata to JSON format
This repository contains scripts used to to generate the results reported in the 2017 Society for Neuroscience conference poster described at:
http://www.abstractsonline.com/pp8/#!/4376/presentation/34934
The scripts retrieve metadata from both DataCite and NWB files and convert the metadata to JSON format so it can then be converted to RDF (OWL) using Json2Semantic available from:
https://github.com/jezekp/Json2Semantic
Json2Semantic is a light-weight wrapper for the Semantic-Framework tool:
https://github.com/NEUROINFORMATICS-GROUP-FAV-KIV-ZCU/Semantic-Framework
The Json2Semantic tool generates Java model according to a Json document then the Semantic-Framework generates a RDF document from the Java objects generated by the first tool.
The scripts in this repository are organized into two directories:
datacite - scripts to download DataCite.org metadata for CRCNS.org datasets and convert it from XML to JSON, and
nwb - Extract metadata from NWB files (version 1.0.6) and store it as JSON
Information about the poster is:
Session 174 - Systems Biology and Bioinformatics 174.05 / UU26 - Making structured neurophysiology data searchable using Semantic Web methods authors: *J. L. TEETERS, P. JEŽEK, S. MACKESEY, F. SOMMER; Redwood Ctr. for Theoretical Neurosci., UC Berkeley, Berkeley, CA
Abstract:
The representation of neuroscience data that enables searching and sharing between labs is challenging. Custom data structures specific to a lab need to be represented in a standardized way so that common tools can be deployed. Semantic Web technologies offer potential for representing a variety of neuroscientific data in a common format. The primary unit of data in the Semantic Web is the RDF (Resource Description Framework) triple; data represented in RDF triples is searchable with a query language called SPARQL. We present a system for mapping neuroscience metadata in two existing formats into RDF triples:
- DataCite [1], provides an infrastructure to register structured metadata describing shared datasets and is used to generate a DOI that can be used to cite datasets. Registration of a dataset requires submitting an XML document with metadata describing the dataset. Since DataCite is designed to work with data from any domain, it does not include predefined structures that are specific for neurophysiology metadata.
- The Neurodata Without Borders: Neurophysiology (NWB) format [2,3] provides standard ways of storing neurophysiology data of different types and associated metadata. The NWB format can be extended in a structured way by defining extensions using a specification language or through custom additions that are not defined using an extension. The data used to develop these methods is available at CRCNS.org, an online repository hosting publicly available neurophysiology data.