Stream Metadata Layout (draft)

Stream metadata is primarily stored in Table Segments. Here we summarize the main data structures used for stream metadata, and the main reference used to extract this information is this PDP:

https://github.com/pravega/pravega/wiki/PDP-32:-Improve-scalability-of-Controller-Metadata

Three data structures that are present for all streams are:

Stream segment record: contains stream segment information -- size is 32 bytes
Epoch record: contains information about an epoch and the segments the epoch comprises -- size 16 bytes + (# of segments) * 32 bytes
History time series record: captures the delta between two consecutive epochs -- size is 16 + (# of segments created/sealed) * 32 bytes

When segments are sealed, they are added to a sealed segment shard. The segments are sharded to avoid having a single data structure growing large as may segments are sealed over time. There is one single data structure we need:

Sealed segment map shard: contains a map of segments ids to size -- size is (# of segments in the shard) * 16 bytes

In the case the stream has a retention policy configured, the following data structures are also created and stored:

Retention set: contains a list of stream cut reference records -- size is (# of records) * 16 bytes
Stream cut reference record: records time and size -- size is 16 bytes
Stream cut record: contains stream cut information for each cut-- size is 16 + (# of segments in stream cut) * 16

In general, it is difficult to estimate the byte overhead of stream metadata, since it changes as the stream evolves, and may contain temporary information such as transaction metadata.

Pravega - Streaming as a new software defined storage primitive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream Metadata Layout (draft)

Clone this wiki locally