-
Notifications
You must be signed in to change notification settings - Fork 1
Chunks
The fundamental unit of miniSEED is a data record. Normally, a time series is stored and exchanged as a sequence of these records. Each record is independently usable even when presented in a sequence. An MS3 record is composed of a header, followed by zero or more blockettes. The format supports streaming while the record is being generated at the blockette granularity. The format does not limit data records in total size. To limit problems with timing system drift & resolution and practical issues of subsetting & resource limitation for readers of the data, typical record lengths for raw data generation and archiving are recommended to be near the range of 256 and 4096 bytes.
An MS3 record is composed of a header, followed by zero or more blockettes. This standard documents the archive record header, which is used in MS3 files. Streaming protocols may transfer individual blockettes and use a different header.
Field | Field name | Type | Length | Offset | Content |
---|---|---|---|---|---|
[Archive record header] | |||||
1 | Record indicator | CHAR | 3 | 0 | ASCII "MS3" |
2 | Record length | VARINT | V | 3 | |
[Blockettes, zero or more may be present] | |||||
3 | Blockette type | VARINT | V | V | |
4 | Blockette length | VARINT | V | V | |
5 | Blockette payload | encoded | V | V |
All length values are specified in bytes, which are assumed to be 8-bits in length, "V" denotes variable length.
- CHAR
- Character data.
- INT8
- Signed 8-bit integer.
- UINT8
- Unsigned 8-bit integer.
- UINT16
- Unsigned 16-bit integer, little-endian.
- UINT32
- Unsigned 32-bit integer, little-endian.
- FLOAT64
- IEEE-754 64-bit (double precision) floating point number, little-endian.
- VARINT
- Base 128 variable length integer (little-endian) as defined in Protobuf (RFC). See also example Python implementation.
- Record indicator -- ASCII "MS3".
- Record length in bytes.
- Blockette type, see section 3.
- Blockette length in bytes.
- Encoded blockette payload, see section 3.
The following encodings are currently under consideration:
- Fixed-length struct (little-endian), optionally followed by opaque variable length data (same as miniSEED 2.x blockettes, except that only little-endian is allowed).
- Protobuf (RFC). A blockette would be represented as a single field or an embedded message, where field number would be equal to blockette number. Note that Protobuf supports "repeated" fields, which are useful for waveform data and other blockettes that may appear multiple times in a record.
For efficiency reasons, essential blockettes (eg., time series identifier, record start time) should occur near the beginning of a record. In this case, assuming that only one instance of a blockette per record is allowed, and knowing the record length, it would be possible to skip to next record as soon as all relevant blockettes are found.
If a blockette depends on other blockettes, the dependee must occur before depender. For example, the waveform metadata blockette must occur before waveform blockettes that depend on it.
Waveform blockettes must be sorted by time. Intra-record data gaps are not possible.
0...999999 | reserved for organizations | ||
0...99999 | reserved for the FDSN standard | ||
0..127 | essential blockettes (1-byte ID) | ||
128..16383 | important blockettes (2-byte ID) | ||
1000000+ | reserved for manufacturer extensions |
Note: in case of Protobuf, IDs 1..15 would take 1 byte, 16-2047 would take 2 bytes, 2048..262143 would take 3 bytes, etc. However, 1 byte would be saved when encoding single-value blockette.
Below is the [incomplete] list of standard blockettes. Unless noted otherwise, only one instance of a blockette per record is allowed.
Flags are currently represented as a group of 8 bits (UINT8) in a single blockette. An alternative would be using a zero-length blockette (2 bytes) or boolean (3 bytes) for each individual flag.
In case of using Protobuf encoding, blockettes 21 and 22 would be unified, because both would have the same size.
Time series identifier as defined by the FDSN. Future revisions of the standard may add alternative time series identifiers to be used in other ecosystems.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Time series identifier | V | V | 0 |
Time of the first data sample and related flags. A representation of UTC using individual fields for year, day-of-year, hour, minute, second and nanosecond. A 60 second value is used to represent a time value during a positive leap second.
Future revisions of the standard may add relative time blockette, which could be useful with simulations and synthetic data.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Year (0-65535) | UINT16 | 2 | 0 |
2 | Day-of-year (1-366) | UINT16 | 2 | 2 |
3 | Hour (0-23) | UINT8 | 1 | 4 |
4 | Minute (0-59) | UINT8 | 1 | 5 |
5 | Second (0-60) | UINT8 | 1 | 6 |
6 | Nanosecond (0-999999999) | UINT32 | 4 | 7 |
7 | Flags | UINT8 | 1 | 11 |
- Flags
-
- [Bit 0]
- Time tag is questionable.
- [Bit 1]
- Clock locked.
One or more leap seconds occurred during this record. The value specifies the number of leap seconds and direction. For example use “+1” to specify a single positive leap second and “-1” to specify a single negative leap second.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Leap second | INT8 | 1 | 0 |
Optional sensor identification.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Vendor ID | UINT16 | 2 | 0 |
2 | Product ID | UINT16 | 2 | 2 |
3 | Serial number | UINT16 | 2 | 4 |
4 | Component | UINT8 | 1 | 6 |
5 | Preset | UINT8 | 1 | 7 |
- Vendor ID
- Vendor ID, such as used with USB devices.
- Product ID
- Product ID, such as used with USB devices.
- Serial number
- Serial number of the device.
- Component
- Component, eg.: 0=Z, 1=N, 2=E. Device-specific.
- Preset
- A code indicating gain and filter settings. Device-specific.
Optional datalogger (digitizer) identification.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Vendor ID | UINT16 | 2 | 0 |
2 | Product ID | UINT16 | 2 | 2 |
3 | Serial number | UINT16 | 2 | 4 |
4 | Channel | UINT8 | 1 | 6 |
5 | Preset | UINT8 | 1 | 7 |
- Vendor ID
- Vendor ID, such as used with USB devices.
- Product ID
- Product ID, such as used with USB devices.
- Serial number
- Serial number of the device.
- Channel
- Channel, eg.: 0=Z1, 1=N1, 2=E1, 3=Z2, 4=N2, 5=E2, 6=supply voltage, etc. Device-specific.
- Preset
- A code indicating channel settings (gain, filters, etc.). Device/channel-specific.
This blockette must be added to (10, 11) when non-standard gain or custom gain reduction is used.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Gain | FLOAT64 | 8 | 0 |
- Gain
- The value 1.0 corresponds to standard gain of the respective sensor/datalogger/preset combination.
Metadata for all waveform blockettes in a record. This blockette must occur before any waveform data blockettes (21, 22).
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Sample rate/period | FLOAT64 | 8 | 0 |
2 | Data encoding format | UINT8 | 1 | 8 |
- Sample rate/period
- When the value is positive it represents the rate in samples per second, when it is negative it represents the sample period in seconds. Creators should use the negative value sample period notation for rates less than 1 samples per second to retain resolution.
- Data encoding format
-
A code indicating the encoding format. The following codes are defined:
- 1
- 16-bit integers, little-endian
- 3
- 32-bit integers, little-endian
- 4
- IEEE 32-bit floats, little-endian
- 5
- IEEE 64-bit floats, little-endian
- 10
- Steim-1 integer compression (defined only in big-endian)
- 11
- Steim-2 integer compression (defined only in big-endian)
- 19
- Steim-3 integer compression (defined only in big-endian)
- 53
- 32-bit integers, little-endian, general compressor (TBD)
- 54
- 32-bit IEEE floats, little-endian, general compressor (TBD)
- 55
- 64-bit IEEE floats, little-endian, general compressor (TBD)
Waveform data up to 255 samples. It is recommended to use multiple small waveform blockettes per record to achieve better real-time latency.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Number of samples | UINT8 | 1 | 0 |
2 | Data payload | encoded | V | 1 |
Waveform data up to 2^32 samples. Multiple instances of this blockette per record is allowed.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Number of samples | UINT32 | 4 | 0 |
2 | Data payload | encoded | V | 4 |
Log message. Multiple instances of this blockette per record is allowed.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | UTF-8 text | V | V | 0 |
CRC-32C (Castagnoli) value, calculated over preceding blockettes, header excluded. Excluding the header (with record length) makes it possible to add blockettes in a data center without invalidating the CRC-32 value calculated in a digitizer. Multiple CRC-32 blockettes per record can be used.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | CRC-32 value | UINT32 | 4 | 0 |
Recommended values: 1 for raw data, 2 for data following quality control procedures, and the value is incremented for each later revision.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Data version | UINT8 | 1 | 0 |
Quality indicator. Primarily for older data, use not recommended for new data.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Quality indicator | CHAR | 1 | 0 |
Signal quality flags, ported from miniSEED 2.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Flags | UINT8 | 1 | 0 |
- Flags
-
- [Bit 0]
- The mass position is off-scale.
- [Bit 1]
- Amplifier saturation detected.
- [Bit 2]
- Digitizer clipping detected.
- [Bit 3]
- Spikes detected.
- [Bit 4]
- Glitches detected.
- [Bit 5]
- A digital filter may be charging.
Deprecated miniSEED 2 flags, do not use.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Flags | UINT8 | 1 | 0 |
- Flags
-
- [Bit 0]
- Station volume parity error possibly present.
- [Bit 1]
- Long record read (possibly no problem).
- [Bit 2]
- Short record read (record padded).
- [Bit 3]
- Start of time series.
- [Bit 4]
- End of time series.
- [Bit 5]
- Telemetry synchronization error.
- [Bit 6]
- Missing/padded data present.
A vendor specific timing quality value from 0 to 100% of maximum accuracy.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Timing quality | UINT8 | 1 | 0 |
Estimated maximum timing error in seconds.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Maximum timing error | FLOAT64 | 8 | 0 |
Time correction in seconds applied to record start time.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | Time correction | FLOAT64 | 8 | 0 |
User-defined extension (JSON). Multiple instances of this blockette per record is allowed.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | JSON data (UTF-8) | V | V | 0 |
User-defined extension (binary). Multiple instances of this blockette per record is allowed.
Field | Field name | Type | Length | Offset |
---|---|---|---|---|
1 | UUID | CHAR | 16 | 0 |
2 | Data payload | V | V | 16 |
- UUID
- Data type identification (https://en.wikipedia.org/wiki/Universally_unique_identifier).
- Data payload
- Data payload, corresponding to the UUID.
Further miniSEED 2.x blockettes (timing, detection, calibration, beam) will be converted to MS3 counterparts. Some MS2 blockettes will be split into multiple MS3 blockettes.