Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial multi channel trace file format specification #841

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 184 additions & 4 deletions doc/architecture/trace_file_formats.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,200 @@ endif::[]
[#top-osi_trace_file_formats]
TimmRuppert marked this conversation as resolved.
Show resolved Hide resolved
= OSI trace file formats

There are two formats for storing multiple serialized OSI messages in one trace file.
== Single channel trace file formats
There are two formats for storing and reading multiple serialized OSI messages of the same type in one trace file.
These formats are very bare bones and do not contain any additional information, like meta-data, schemas, nor do they allow for random access.
For more advanced use-cases, consider using the multi channel trace file format.

*.osi::
Binary trace file.
Single channel binary trace file.
Messages are separated by a length specification before each message.
The length is represented by a four-byte, little-endian, unsigned integer.
The length does not include the integer itself.

*.txth::
Human-readable plain-text trace file.
Single channel human-readable plain-text trace file.
Messages are separated by newlines.
Each message is a serialized OSI message in protocol buffer text format.

NOTE: The `.txth` format is intended for human consumption (e.g. for debugging and manual checks).
It is currently not supported for reading by the OSI API, as it is not unambiguously deserializable.

NOTE: Previous releases of OSI also supported a so-called plain-text trace file format, with file extension `.txt`.
This legacy format did not contain plain-text, but rather binary protobuf messages separated by a special separator.
For obvious reasons the format was deprecated and fully replaced with the `.osi` binary file format.
This release no longer contains any support for the legacy `.txt` file format.
These files may be used for manual checks.

== Multi channel trace file format

=== Overview

The OSI multi channel trace file format is a binary file format that allows for storing multiple serialized OSI message streams of the same or different types in one trace file, along with additional meta-data, and other related data streams.
Due to the nature of the format, it allows for random access to the data streams, and is suitable for more advanced use-cases.

The OSI multi channel trace file format is based on the MCAP file format, which is a generic multi channel trace file format.
The OSI multi channel trace file format is a specialization of the MCAP file format, with additional constraints and requirements specific to OSI.
Hence, any valid OSI multi channel trace file is also a valid MCAP file, but not the other way around.

The following rules apply to OSI multi channel trace files:

- The file extension to be used is `.mcap`.
- The file must be a valid MCAP file according to the https://mcap.dev/spec[MCAP format specification] version `0x30`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimmRuppert I actually cannot find a version number for the MCAP specification but only different ones for the associated libraries and CLI which is then different for Python, C++, etc .... I am a bit confused. Is this the specification version 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec goes about it in a very round-about way:

Magic
An MCAP file must begin and end with the following magic bytes:

0x89, M, C, A, P, 0x30, \r, \n

The byte following "MCAP" is the major version byte. 0x30 is the ASCII character 0. Any changes to this specification document (i.e. adding fields to records, introducing new records) will be binary backward-compatible within the major version.

(Emphasis by me)

So it is stated quite unclearly, but I would treat this as the major version of the specification, given the statements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what confuse me. "include the major version like this" but not "this spec IS major version 0" ...

- The file must be an indexed MCAP file, including chunk index records in the summary section, with all message records written into chunk records.
- Chunk records must either be uncompressed or compressed using either the `zstd` or `lz4` compression algorithms.
- The file must contain a meta-data record with the OSI trace file meta-data defined in section <<sec-osi-trace-file-multi-global-meta-data>>.
This meta-data record identifies the file as an OSI multi channel trace file.
- The file must contain a schema record in the summary section for each top-level message type that is used in one or more OSI channels with the OSI message schema as defined in section <<sec-osi-trace-file-multi-schema-record>>.
pmai marked this conversation as resolved.
Show resolved Hide resolved
- The file must contain at least one OSI message stream in a channel as defined in section <<sec-osi-trace-file-multi-channel>>.
- The file may contain additional non-OSI message streams in other channels.

NOTE: All OSI and non-OSI message streams stored in the same file share a common time base.
Storing of unrelated message streams in one trace file is therefore not generally useful and should be avoided.

[#sec-osi-trace-file-multi-global-meta-data]
=== Multi channel trace file global meta-data

The file must contain a meta-data record with the `name` field being `net.asam.osi.trace` and the following mandatory `metadata` entries:
pmai marked this conversation as resolved.
Show resolved Hide resolved

`version`::
The semantic version number of the OSI release that this OSI trace file conforms to.
This is a string in the format `major.minor.patch`, for example `3.0.0`.
Note that this version number is not necessarily the same as the OSI schema version(s) used in the trace file.
Rather, it indicates the version of the OSI trace file format itself.

`min_osi_version`::
The minimum version of the OSI schema used in the trace file OSI channels.
This is a string in the format `major.minor.patch`, for example `3.0.0`.
pmai marked this conversation as resolved.
Show resolved Hide resolved

`max_osi_version`::
The maximum version of the OSI schema used in the trace file OSI channels.
This is a string in the format `major.minor.patch`, for example `3.0.0`.
pmai marked this conversation as resolved.
Show resolved Hide resolved

`min_protobuf_version`::
The version of the protobuf implementation used in the trace file.
This is a string in the format `major.minor.patch`, for example `3.17.3`.

`max_protobuf_version`::
The version of the protobuf implementation used in the trace file.
This is a string in the format `major.minor.patch`, for example `3.17.3`.
pmai marked this conversation as resolved.
Show resolved Hide resolved

`zero_time`::
jdsika marked this conversation as resolved.
Show resolved Hide resolved
jdsika marked this conversation as resolved.
Show resolved Hide resolved
The point in time corresponding to time 0 in all timestamps in the trace file.
pmai marked this conversation as resolved.
Show resolved Hide resolved
Must be provided as a full ISO8601 formatted date time string, including timezone data, conforming to the https://www.w3.org/TR/xmlschema11-2/#dateTimeStamp[XML Schema dateTimeStamp] lexical space.
Values must match the following regular expression:
`-?([1-9][0-9]{3,}|0[0-9]{3})-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])T(([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]+)?|(24:00:00(\.0+)?))(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))`
+
NOTE: that even in pure simulation use cases there is usually a relationship to real time, as on-board components and environment simulation have necessary relationships to real time (for example, the embedded `HostVehicleData` will carry relevant real time information, for localization and other purposes).
pmai marked this conversation as resolved.
Show resolved Hide resolved

`creation_time`::
The point in time when the trace file was created.
jdsika marked this conversation as resolved.
Show resolved Hide resolved
pmai marked this conversation as resolved.
Show resolved Hide resolved
Must be provided as a full ISO8601 formatted date time string, including timezone data, conforming to the https://www.w3.org/TR/xmlschema11-2/#dateTimeStamp[XML Schema dateTimeStamp] lexical space.

The `net.asam.osi.trace` meta-data record may also contain the following recommended `metadata` entries:

`description`::
A human-readable description of the data contained in the multi channel trace file.

`creator`::
pmai marked this conversation as resolved.
Show resolved Hide resolved
A list separated by commas of entities (not tools) involved in the creation of the data contained in the file.

`license`::
If the contents of the file is licensed under any SPDX registered license, this entry should contain the SPDX identifier for the license.

pmai marked this conversation as resolved.
Show resolved Hide resolved
`data_sources`::
A list separated by commas of data sources used in the creation of the data contained in the file.

The file may contain arbitrary additional meta-data records, however meta-data records with names starting with `net.asam.osi` are reserved for future use by {THIS_STANDARD}.
It is strongly recommended to follow reverse domain name notation for custom meta-data record names to avoid conflicts.

[#sec-osi-trace-file-multi-schema-record]
=== OSI message schema

For each OSI top-level message type that is used in one or more OSI channels, the OSI multi channel trace file must contain a corresponding schema record in the summary section.
Note that if multiple versions of the OSI schema are used in the same trace file, a schema record must be included for each version, with different schema IDs.

The schema record must contain the following fields:

`id`::
A file-wide unique non-zero identifier for the schema record.

`name`::
The fully qualified name to the message within the OSI descriptor set. For example, for the `SensorView` message type, this would be `osi3.SensorView`.
pmai marked this conversation as resolved.
Show resolved Hide resolved

`encoding`::
The value `protobuf`.

`data`::
A binary FileDescriptorSet as produced by `protoc --include_imports --descriptor_set_out`.

The schema record must be stored in the summary section of the trace file, and must be referenced by the OSI channels that use the schema.

[#sec-osi-trace-file-multi-channel]
=== OSI channel

An OSI channel is a data stream within the OSI multi channel trace file that contains serialized OSI top-level messages of the same type.
Note that non-top-level messages must not be stored directly in OSI channels.

Each OSI channel must be described by a channel record in the summary section of the trace file with the following fields:

`id`::
A file-wide unique identifier for the channel.

`schema_id`::
The ID of the schema record that describes the message type of the channel.

`topic`::
A unique name for the channel within the trace file.
When recording OSI traces for a model packaged according to the OSI Sensor Model Packaging (OSMP) layer, using the naming conventions defined in the OSMP specification for variables as topics is recommended.
For example, for a sensor model with two SensorView inputs and one SensorData output, the topic names would be `OSMPSensorViewIn[1]`, `OSMPSensorViewIn[2]`, and `OSMPSensorDataOut`, accordingly.
pmai marked this conversation as resolved.
Show resolved Hide resolved
In other cases, the topic name should be chosen to reflect the purpose of the channel, and should include some indication of the message type.

`message_encoding`::
The value `protobuf`.

`metadata`::
A map of additional meta-data for the channel.
This map may contain arbitrary key-value pairs, however keys starting with `net.asam.osi` are reserved for use by {THIS_STANDARD}.
It is strongly recommended to follow reverse domain name notation for custom meta-data keys to avoid conflicts.
The following mandatory entries are defined by {THIS_STANDARD}:

`net.asam.osi.trace.channel.osi_version`:::
The version of the OSI schema used in creating the data of this OSI channel.
This is a string in the format `major.minor.patch`, for example `3.0.0`.

`net.asam.osi.trace.channel.protobuf_version`:::
The version of the protobuf implementation used in creating the data of this OSI channel.
This is a string in the format `major.minor.patch`, for example `3.17.3`.

The following recommended entries are defined by {THIS_STANDARD}:

`net.asam.osi.trace.channel.description`:::
A human-readable description of the channel.

The channel record must be stored in the summary section of the trace file, and must be referenced by the OSI message records that are part of the channel.

All messages in an OSI channel must be stored in chunk records in the data section of the trace file.

Each message record in a chunk record must contain the following fields:

`channel_id`::
The ID of the channel that the message belongs to.

`sequence`::
Optional message counter to detect message gaps.
If the relevant packaging layer or other source of messages provides a sequence number this can be used.
Otherwise this should be set to zero to indicate that no reliable sequence number is available.

`log_time`::
This field is in nanoseconds and uses the same epoch as the `publish_time` field.
It is used to determine the order of messages in the trace file, and provides for time-based random access to the data streams.
Unless there is a specific reason to set this field to a different value, it should be set to the same value as `publish_time`, as this reflects the time flow as reflected in the OSI message stream.
Only if recreation of the message stream with actual message transmission times is required - for example for asynchronous packaging layers, this field should be set to the time when the message was enqueued for addition to the trace file.
pmai marked this conversation as resolved.
Show resolved Hide resolved

`publish_time`::
The timestamp taken from the timestamp field of the stored OSI top-level message.
The field is in nanoseconds, with the epoch being the epoch of the OSI Timestamp data type.
If top-level messages that do not contain a timestamp field are stored in the trace file, the `publish_time` field must be set to the time when the message was enqueued for addition to the trace file.

`data`::
The serialized OSI message data.
14 changes: 13 additions & 1 deletion doc/architecture/trace_file_naming.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ endif::[]
The names of OSI trace files should have the following format:

----
<timestamp>_<type>_<osi-version>_<protobuf-version>_<number-of-frames>_<custom-trace-name>.osi
<timestamp>_<type>_<osi-version>_<protobuf-version>_<number-of-frames>_<custom-trace-name>.[osi|txth|mcap]
pmai marked this conversation as resolved.
Show resolved Hide resolved
----

**Types**
Expand Down Expand Up @@ -44,6 +44,12 @@ Trace file contains `MotionRequest` messages.
`su`::
Trace file contains `StreamingUpdate` messages.

`multi`::
Trace file contains multiple types of messages for use with multi channel trace file format.
In this case the number-of-frames field should be the largest number of frames across all channels.
pmai marked this conversation as resolved.
Show resolved Hide resolved
The OSI version field should be based on the `version` field in the file meta-data of the multi-channel trace file.
pmai marked this conversation as resolved.
Show resolved Hide resolved
The protobuf version field should be based on the `min_protobuf_version` field in the file meta-data of the multi-channel trace file.
pmai marked this conversation as resolved.
Show resolved Hide resolved

**Example**

Given an OSI trace file with the following information:
Expand Down Expand Up @@ -76,3 +82,9 @@ The recommended file name is:
----
20210818T150542Z_sv_312_300_1523_highway.osi
pmai marked this conversation as resolved.
Show resolved Hide resolved
----

For a corresponding multi channel trace file containing `SensorView` and `GroundTruth` messages, the recommended file name is:

----
20210818T150542Z_multi_312_300_1523_highway.mcap
----
pmai marked this conversation as resolved.
Show resolved Hide resolved