Skip to content

Latest commit

 

History

History
168 lines (133 loc) · 7.21 KB

README.org

File metadata and controls

168 lines (133 loc) · 7.21 KB

NEAD - Non-binary Environmental Archive Data - format specification

Table of contents

Introduction

This document describes the NEAD (Non-binary Environmental Archive Data) file format.

The NEAD format is a delimiter separated value (DSV) format similar to CSV, but with a header section, followed by the data section (example). This document describes the format using Extended Bakus-Naur form (EBNF) syntax (below) or graphically at https://raw.githubusercontent.com/GEUS-PROMICE/NEAD/main/NEAD.svg

Note - Here we use the EBNF notation from the W3C Extensible Markup Language (XML), not the ISO/IEC 14977:1996 EBNF standard. The W3C XML standard is implemented and parsed by the Railroad Diagram Generator (https://github.com/GuntherRademacher/rr) that is used to make the graphics shown in this document.

WARNING

  • This file format is under development.

Citation

Ionut Iosifescu Enescu; Mathias Bavay; Kenneth Mankoff (2020). New Environmental Data Archive (NEAD) format. EnviDat. doi: 10.16904/envidat.187

File format: EBNF syntax

Click on the example graphic below to see the full graphical representation of the EBNF.

./example.png

The graphical EBNF form is generated from the following EBNF syntax (from https://www.w3.org/TR/xml/#sec-notation):

NEAD                  ::= firstline header_section data_section
firstline             ::= '#' ' ' 'NEAD' ' ' version_number ' ' file_format newline
header_section        ::= metadata_header metadata_section fields_header fields_section

/* metadata */
metadata_header       ::= '#' ' ' '[METADATA]' newline
metadata_section      ::= (('#' ((whitespace (required_metadata |
                                              recommended_metadata |
                                              other_metadata))? )? lineend ) | newline)+
required_metadata     ::= ('field_delimiter' assignment field_delimiter) |
                          ('geometry' assignment geometry) |
                          ('srid' assignment EPSG_code)
recommended_metadata  ::= ('station_id' assignment alphanumeric)
recommended_metadata  ::= ('timestamp_meaning' assignment timestamp_meanings)
recommended_metadata  ::= 'nodata' assignment (integer | float)
recommended_metadata  ::= 'timezone' assignment (integer | float | tz_string)
recommended_metadata  ::= ('doi' | 'reference') assignment value
other_metadata        ::= key assignment value

/* fields */
fields_header         ::= '#' ' ' '[FIELDS]' newline
fields_section        ::= (('#' ((whitespace ( required_fields |
                                               recommended_fields |
                                               other_fields ))? )? lineend ) | newline)+
required_fields       ::= ('fields' assignment values)
recommended_fields    ::= ('units_multiplier' |
                           'units_offset' |
                           'units' |
                           'long_name' |
                           'standard_name') assignment values
recommended_fields    ::= 'timestamp_meaning' assignment timestamp_meanings
other_fields          ::= key assignment values

/* data */
data_section          ::= '#' ' ' '[DATA]' newline dataline+
dataline              ::= (value ( field_delimiter value )* newline)
values                ::= value ( field_delimiter whitespace value )*

/*
  NOTE:
  All values must be the same length.
  This means everything in the "[FIELDS]" section maps 1:1 to the columns in the "[DATA]" section.
*/


/* other */
key             ::= char (alphanumeric*)?
value           ::= unicode-char*
assignment      ::= whitespace? '=' whitespace?
field_delimiter ::= [,|\/:;]
version_number  ::= digit* ('.' (alphanumeric)+)? ('.' (alphanumeric)+)?
file_format     ::= 'UTF-8' | 'ASCII'
EPSG_code       ::= 'EPSG' ':' digit digit digit digit
geometry        ::= 'POINT(' float float ')' |
                    'POINTZ(' float float float ')' |
                    WKT_string |
                    column_name
timestamp_meanings ::= 'beginning' | 'end' | 'middle' | 'instantaneous' | 'other' | 'undefined'

/* generic */
whitespace      ::= (tab | space)+
comment         ::= '#' whitespace? (unicode-char)?
lineend         ::= (whitespace | comment)? newline
newline         ::= #x0A
tab             ::= #x9
space           ::= #x20
char            ::= [a-zA-Z]
digit           ::= [0-9]
integer         ::= [+-]? digit+
alphanumeric    ::= (digit | char)+
hex             ::= (digit | [a-fA-F])+
float           ::= integer '.' ((digit)+)?

/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
unicode-char    ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Interfaces

Sample NEAD file

# NEAD 1.0 UTF-8
# [METADATA]
# station_id = 803027F4
# station_name = GC-NET GOES station Summit Station
# srid = EPSG:4326
# geometry = POINTZ(38.5053, 72.5794, 3199)
# nodata = -999
# timezone = 0
# field_delimiter = ,
#
# [FIELDS]
# fields = timestamp,ISWR,OSWR,NSWR,TA1,TA2,RH1,RH2,VW1,VW2,DW1,DW2,P,HS1,HS2,V
# units_offset = 0,0,0,0,273.15,273.15,0,0,0,0,0,0,0,0,0,0
# units_multiplier = 1,1,1,1,1,1,0.01,0.01,1,1,1,1,100,1,1,1
# 
# [DATA]
1996-05-12 11:00:00+00,356.6,288.29,-999,-999,-999,96.05,94.79,3.84,4.2,186.5,-999,691.7,-999,0.05,4.59
1996-05-12 12:00:00+00,489.3,453,-999,-999,-999,95.55,94.04,4.11,4.5,205.5,-999,691.8,-999,0.01,1.05
1996-05-12 13:00:00+00,622,506.87,-15.43,-999,-999,91.01,90.89,3.39,3.58,165.3,-999,692,-999,0,0
1996-05-12 14:00:00+00,684.2,569.11,15.51,-999,-999,87.46,88.67,5.36,5.61,217.6,-999,692.2,-0.01,-0.01,12.69
1996-05-12 15:00:00+00,680.6,572.57,-90.87,-999,-999,86.25,87.5,6.82,7.13,222.3,-999,692.6,-0.01,-0.01,12.73
1996-05-12 16:00:00+00,674.6,569.3,-137.32,-999,-999,87.05,87.38,5.51,5.77,219.6,-999,692.6,0,0,12.78
1996-05-12 17:00:00+00,620.2,528.53,-157.64,-999,-999,88.73,89.67,5.78,6.06,220.9,-999,692.8,0,0.01,12.69
1996-05-12 18:00:00+00,507.6,435.89,-117.4,-999,-999,90.03,91.31,5.84,6.12,221.3,-999,693.1,0,0.01,12.64
1996-05-12 19:00:00+00,406.8,350.35,-61.34,-999,-999,91.13,92.28,5.83,6.1,229.2,-999,692.9,0,0.01,12.61
1996-05-12 20:00:00+00,366.8,319.41,-77.03,-999,-999,91.24,92.4,7.06,7.37,240.2,-999,693,0,0,12.54
1996-05-12 21:00:00+00,275.8,241.88,-92.72,-999,-999,92.66,93.76,4.87,5.16,237.9,-999,693,0,0,12.44