Skip to content

Stream Metadata Layout (draft)

Flavio Junqueira edited this page Aug 11, 2020 · 2 revisions

Stream metadata is primarily stored in Table Segments. Here we summarize the main data structures used for stream metadata, and the main reference used to extract this information is this PDP:

https://github.com/pravega/pravega/wiki/PDP-32:-Improve-scalability-of-Controller-Metadata

Three data structures that are present for all streams are:

  1. Stream segment record: contains stream segment information -- size is 32 bytes
  2. Epoch record: contains information about an epoch and the segments the epoch comprises -- size 16 bytes + (# of segments) * 32 bytes
  3. History time series record: captures the delta between two consecutive epochs -- size is 16 + (# of segments created/sealed) * 32 bytes

When segments are sealed, they are added to a sealed segment shard. The segments are sharded to avoid having a single data structure growing large as may segments are sealed over time. There is one single data structure we need:

  1. Sealed segment map shard: contains a map of segments ids to size -- size is (# of segments in the shard) * 16 bytes

In the case the stream has a retention policy configured, the following data structures are also created and stored:

  1. Retention set: contains a list of stream cut reference records -- size is (# of records) * 16 bytes
  2. Stream cut reference record: records time and size -- size is 16 bytes
  3. Stream cut record: contains stream cut information for each cut-- size is 16 + (# of segments in stream cut) * 16

In general, it is difficult to estimate the byte overhead of stream metadata, since it changes as the stream evolves, and may contain temporary information such as transaction metadata.

Clone this wiki locally