Skip to content

Upgrade Parser Libraries for Golang Tools, Google Summer of Code 2019

Abhishek Gaur edited this page Aug 24, 2019 · 3 revisions

Introduction

Software Package Data Exchange (SPDX) is a set of standards for communicating the components, licenses and copyrights associated with software. These software are accompanied with special files that hold certain meta information: authors, copyrights, licenses, etc. At present, these files come in two major formats: Tag/Value and RDF. Parsing tools are available for these formats in multiple programming languages like Java and Python.

Project Overview

Currently, unlike other tools in Java and Python, the tools-golang only handles SPDX files in tag-value format. As RDF format is also officially defined by the SPDX specification, the primary objective of this project is to add support for RDF format, ensuring that tools-golang are also capable of reading and writing SPDX files in RDF format. Similar to tvloader and tvsaver that handle reading and writing tag-value format, rdfloader and rdfsaver are added to handle RDF documents.

During the initial phase of the project, I enhanced my understanding on how RDF documents are defined and learned more about namespaces, triples (subject, predicate, object) etc, which helped me a lot later in the project. Also, I explored libraries currently available to parse RDF documents. I came across many libraries like Callidon/joseki, knakk/rdf, Rdflib, goraptor etc. but after detailed discussion with mentors decided to use goraptor by [William Waites] for this project for the following reasons:

  • goraptor is the only library written in pure golang which suited the project requirements (though it has a dependency based on C).
  • Pure golang libraries like knakk/rdf lacked proper licenses.
  • other libraries with proper licenses like Rdflib are based on other languages.

Finally, these libraries are utilised to add RDF support:

After finalising the libraries, I went through the documentation of goraptor in more detail to figure out how to implement the parser libraries. I also referred to (previous version)[https://github.com/spdx/ATTIC-tools-go] of tools, based on previous versions of specification to get an understanding on how RDF support should be added to the tools.

As guided by the mentors, I worked on and implemented everything in a separate repository undersanding-goraptor throughout the project period and made pull requests to tools-golang as I implemented different features. Please find the list of all the commits I made to this repository and the list of pull requests made in tools-golang by me during Google Summer of Code 2019 below:

Working of the Tools

Project Working

1. RDF Loader: Parsing RDF Documents

Reading and Parsing data from RDF document

Added the package rdfloader which takes takes in an RDF document input and parses the document. This is achieved by using goraptor library which returns goraptor.Statement consisting of subject, predicate and object (collectively known as triples) for each statement parsed. These triples are then processed further and meaningful information is extracted from the statements before storing it to intermediate data structures defined to store the information.

2. Processing Data Structures

Passing the parsed data from intermediate to current data structures.

RDF loader on parsing the data from the RDF document uses an intermediate data structure to store all the information parsed from the RDF document. To address the need of allowing the users to be able to use the tools with more flexibility, separate scripts are also included to convert the data from intermediate to standard data structures and vice versa. This is done to make sure that users are able to use the loader and writer separately as well.

Currently, due to the limited functionality in the current version of tools, it is not possible to store some information conveyed into the standard data structures. So, Defining an intermediate data structure to store the information seemed to be a better option as it made it easier to store in and access the information from the structure. Also, it will be easier to transfer the information from intermediate and standard data structures in future.

3. RDF Saver: Writing RDF documents

Saving data into RDF document

Added the package rdfsaver which takes in the data as an input, either parsed from the loader or coming from other tools, and writes an RDF document. goraptor.Serializer and goraptor.Namespace methods from the goraptor library are utilised to implement the RDF saver. As the standard data structures are defined in a way that it is not possible to store all the information communicated via the RDF document, the actual resultant document output might differ from the RDF document parsed. This can be improved in the future by allowing the standard data structures to receive and store more information from the intermediate data structure.

4. Using the Tools

Added Examples to test out RDF loader and saver separately. Please find below the examples for how to use these packages: