The purpose of the NDJSON, or new-line delimited JSON, representation of Dataset-JSON is to simplify streaming large datasets. With NDJSON, dataset can easily be read or written one row at a time without the need to load the entire dataset into memory. Most programming languages have libraries that can read a large JSON dataset as a stream, but in cases where such a library is not available or performs poorly the NDJSON format makes it easy for the program read and write a row at a time. The NDJSON format is an alternative to the JSON format and both are part of the Dataset-JSON standard.
The NSJSON and JSON dataset content are the same. The only difference is that the NDJSON content is written as 1 line of valid JSON at a time. See The Dataset-JSON NDJSON format section below for a detailed description of the NDJSON format.
In a data exchange scenario, the sender and receiver determine whether to use the JSON or NDJSON representation of Dataset-JSON. Given the relative simplicity of the Dataset-JSON specification, converting between the two formats is a straightforward process. The NDJSON example datasets were generated by converting the existing JSON Dataset-JSON example datasets.
NDJSON is a standard for delimiting JSON in stream protocols. In NDJSON, each line is valid JSON. The JSON is delimited by the newline character (\n or 0x0A) which may be preceded by a carriage return character (\r or 0x0D). UTF-8 encoding is expected.
The Dataset-JSON NDJSON format is created from the Dataset-JSON standard by:
- Row 1. Create 1 JSON object from the metadata, including the dataset attributes and column definitions.
- Row 2 - n. Create 1 array per data row
All the metadata is in the first row of the dataset and everything else is a data row.
Each row can be parsed and processed as standalone JSON.
The NDJSON example datasets have been converted from the JSON versions, so they contain the same content. The examples are available in the examples/NDJSON folder and use .NDJSON as the extension.
- json2ndjson.py: Retrieves the example datasets from the CDISC DataExchange-DatasetJSON GitHub repo and converts them from JSON to NDJSON.
- ndjson2json.py: Converts the NDJSON datasets in this project back into JSON format. Used as part of round-trip testing.
- ndjson2csv: Coverts the NDJSON datasets in this project into the csv format.
- validate_ndjson.py: Validates the NDJSON example files against a LinkMl model.
- validate_ndjson_json_schema.py: Validates the NDJSON example files against a JSON schema generated from the LinkML model.