Chunks

MiniSEED 3 "chunks" format (white paper option #3)

Section 1: Overview

General structure

The fundamental unit of miniSEED is a data record. A time series is stored and exchanged as a sequence of these records, each of which is independently usable. An MS3 record is composed of a header, a list of blockettes ("chunks"), record terminator (null blockette) and optional padding. The MS3 format supports arbitrary record lengths, but data centers should restrict record length to a sane value (eg., 4096 bytes), such that time windows can be extracted from a file without splitting records.

Section 2: Structure of the MS3 record

An MS3 record is composed of header, a list of blockettes ("chunks"), record terminator (null blockette) and optional padding. Several types of headers can be used. This standard documents only archive record header, which is used with MS3 files. Real-time transfer protocols may use a different header.

Layout

Field	Field name	Type	Length	Offset	Content
[Archive record header]
1	Record indicator	CHAR	2	0	ASCII "MS"
2	Format version	UINT8	1	2	3
3	Record length	VARINT	V	3
[Blockettes, zero or more may be present]
4	Blockette type	VARINT	V	V
5	Blockette length	VARINT	V	V
6	Blockette payload	encoded	V	V
[Record terminator (null blockette)]
7	Blockette type	VARINT	V	V	0
8	Blockette length	VARINT	V	V	0
[Optional padding]
9	Padding	VOID	V	V

All length values are specified in bytes, which are assumed to be 8-bits in length, "V" denotes variable length.

Data types

CHAR: Character data.
UINT8: Unsigned 8-bit integer.
UINT16: Unsigned 16-bit integer, little-endian.
UINT32: Unsigned 32-bit integer, little-endian.
FLOAT32: IEEE-754 32-bit floating point number, little-endian.
FLOAT64: IEEE-754 64-bit (double precision) floating point number, little-endian.
VARINT: Base 128 variable length integer (little-endian) as defined in Protobuf. See also example Python implementation.
VOID: Ignored data

Description of fields

Record indicator -- ASCII "MS".
Data format version, set to 3 for this version.
Record length in bytes.
Blockette type, see section 3.
Blockette length in bytes.
Encoded blockette payload, see section 3.
Type of null blockette (0).
Length of null blockette (0).
Padding; series of NULL bytes recommended.

Section 3: Blockettes

Encoding

Blockettes should use one of the encodings defined in the standard. Currently the only defined encoding is simple little-endian fixed-length struct, optionally followed by opaque variable length data (same as miniSEED 2.x blockettes, except that only little-endian is allowed). If a new revision of the miniSEED standard adds another encoding, that encoding could be used for blockettes that are defined later on.

In exceptional cases, new revisions of the standard may append fields to existing blockettes (this was a practice in miniSEED 2.x). If a blockette is shorter than expected (new software, old format), then readers should pad the blockette with NULL bytes. If a blockette is longer than expected (old software, new format), then readers should truncate it.

Order

For efficiency reasons, essential blockettes (eg., time series identifier, record start time) should occur near the beginning of a record. In this case, assuming that only one instance of a blockette per record is allowed, and knowing the record length, it would be possible to skip to next record as soon as all relevant blockettes are found.

If a blockette depends on other blockettes, the dependee must occur before depender. For example, the waveform metadata blockette must occur before waveform blockettes that depend on it.

Waveform blockettes must be sorted by time. Intra-record data gaps are not possible.

Allocation of blockette types

0...999999	reserved for organizations
	0...99999	reserved for the FDSN standard
		0..127	essential blockettes (1-byte ID)
		128..16383	important blockettes (2-byte ID)
	100000..199999	reserved for IRIS extensions
	200000..299999	reserved for EIDA extensions
1000000+	reserved for manufacturer extensions

Section 4: Definition of standard blockettes

Below is the [incomplete] list of standard blockettes. Unless notes otherwise, only one instance of a blockette per record is allowed.

Record terminator (0)

This blockette is used for terminating the record.

Time series identifier (1)

Time series identifier as defined by the FDSN. Future revisions of the standard may add alternative time series identifiers.

Field	Field name	Type	Length	Offset
1	Time series identifier	V	V	0

Record start time (2)

Time of the first data sample and related flags. A representation of UTC using individual fields for year, day-of-year, hour, minute, second and nanosecond. A 60 second value is used to represent a time value during a positive leap second.

Future revisions of the standard may add relative time blockette, which could be useful with simulations and synthetic data.

Field	Field name	Type	Length	Offset
1	Year (0-65535)	UINT16	2	0
2	Day-of-year (1-366)	UINT16	2	2
3	Hour (0-23)	UINT8	1	4
4	Minute (0-59)	UINT8	1	5
5	Second (0-60)	UINT8	1	6
6	Nanosecond (0-999999999)	UINT32	4	7
7	Flags	UINT8	1	11

Flags

[Bit 0]: Time tag is questionable.
[Bit 1]: Clock locked.

Sensor (10)

Optional sensor identification.

Field	Field name	Type	Length	Offset
1	Vendor ID	UINT16	2	0
2	Product ID	UINT16	2	2
3	Serial number	UINT16	2	4
4	Preset	UINT8	1	6

Vendor ID: Vendor ID, such as used with USB devices.
Product ID: Product ID, such as used with USB devices.
Serial number: Serial number of the device.
Preset: Device preset, identifying gain settings, filters, etc.

Datalogger (11)

Optional datalogger (digitizer) identification.

Field	Field name	Type	Length	Offset
1	Vendor ID	UINT16	2	0
2	Product ID	UINT16	2	2
3	Serial number	UINT16	2	4
4	Preset	UINT8	1	6

Vendor ID: Vendor ID, such as used with USB devices.
Product ID: Product ID, such as used with USB devices.
Serial number: Serial number of the device.
Preset: Device preset, identifying gain settings, filters, etc.

Gain (12)

This blockette must be added to (10, 11) when non-standard gain or custom gain reduction is used.

Field	Field name	Type	Length	Offset
1	Gain	FLOAT64	8	0

Gain: The value 1.0 corresponds to standard gain of the respective sensor/datalogger/preset combination.

Waveform metadata (20)

Metadata for all waveform blockettes in a record. This blockette must occur before any waveform data blockettes (21, 22).

Field	Field name	Type	Length	Offset
1	Sample rate/period	FLOAT32	4	0
2	Data encoding format	UINT8	1	4

Sample rate/period

Sample rate encoded in IEEE-754 floating point format. When the value is positive it represents the rate in samples per second, when it is negative it represents the sample period in seconds. Creators should use the negative value sample period notation for rates less than 1 samples per second to retain resolution.

Data encoding format

A code indicating the encoding format. The following codes are defined:

1: 16-bit integers, little-endian
3: 32-bit integers, little-endian
4: IEEE 32-bit floats, little-endian
5: IEEE 64-bit floats, little-endian
10: Steim-1 integer compression (defined only in big-endian)
11: Steim-2 integer compression (defined only in big-endian)
19: Steim-3 integer compression (defined only in big-endian)
53: 32-bit integers, little-endian, general compressor (TBD)
54: 32-bit IEEE floats, little-endian, general compressor (TBD)
55: 64-bit IEEE floats, little-endian, general compressor (TBD)

Waveform data (21)

Waveform data up to 255 samples. It is recommended to use multiple small waveform blockettes per record to achieve better real-time latency.

Field	Field name	Type	Length	Offset
1	Number of samples	UINT8	1	0
2	Data payload	encoded	V	1

Large waveform data (22)

Waveform data up to 2^32 samples. Multiple instances of this blockette per record is allowed.

Field	Field name	Type	Length	Offset
1	Number of samples	UINT32	4	0
2	Data payload	encoded	V	4

Log (23)

Log message. Multiple instances of this blockette per record is allowed.

Field	Field name	Type	Length	Offset
1	UTF-8 text	V	V	0

CRC-32 (30)

CRC-32C (Castagnoli) value, calculated over preceding blockettes, header excluded. Excluding the header (with record length) makes it possible to add blockettes in a data center without invalidating the CRC-32 value calculated in a digitizer. Multiple CRC-32 blockettes per record can be used.

Field	Field name	Type	Length	Offset
1	CRC-32 value	UINT32	4	0

Data version (90)

Recommended values: 1 for raw data, 2 for data following quality control procedures, and the value is incremented for each later revision.

Field	Field name	Type	Length	Offset
1	Data version	UINT8	1	0

Timing quality (100)

A vendor specific timing quality value from 0 to 100% of maximum accuracy.

Field	Field name	Type	Length	Offset
1	Timing quality	UINT8	1	0

Quality indicator (102)

Quality indicator. Primarily for older data, use not recommended for new data.

Field	Field name	Type	Length	Offset
1	Quality indicator	CHAR	1	0

JSON data (126)

User-defined extension (JSON). Multiple instances of this blockette per record is allowed.

Field	Field name	Type	Length	Offset
1	JSON data (UTF-8)	V	V	0

Generic (127)

User-defined extension (binary). Multiple instances of this blockette per record is allowed.

Field	Field name	Type	Length	Offset
1	UUID	CHAR	16	0
2	Data payload	V	V	16

UUID: Data type identification (https://en.wikipedia.org/wiki/Universally_unique_identifier).
Data payload: Data payload, corresponding to the UUID.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly