Skip to content

PDP 43A (Large Events Simplified)

Derek Moore edited this page Jul 12, 2021 · 1 revision

Status: UNDER REVIEW.

NOTE: this is an alternative to PDP-43. PDP-43 is not discarded or otherwise abandoned - this page (PDP-43A) presents an alternate approach to getting this done.

Discussion and comments in Issue 6052.

Table of Contents:

  1. Motivation
  2. Requirements
  3. Other Considered Approaches
  4. Design
  5. Segment Store Changes
  6. Client and Wire Protocol Changes

Motivation

See PDP-43.

Requirements

See PDP-43.

Changes:

  • Requirement 2 is relaxed (ingestion speed may not match that of classical appends, as long as it stays within a reasonable range).
  • Requirement 3 is dropped.

Other Considered Approaches

See PDP-43.

Design

See PDP-43. The entire design stays the same, except for the changes below.

Changes:

  • Transient Segments are created via a new wire command CreateTransientSegment. The CreateTransientSegment will name the parent segment. The Segment Store will generate a unique name and enter it into the metadata. This name is not returned to the Client.
  • Transient Segments will not allow any Extended Attributes.
  • Unmerged Transient Segments will be cleaned up (eventually) after a container recovery.
  • If a connection is dropped, the Append Processor will lose its reference to the Transient Segment.
  • Transient Segments are cleaned up periodically (as PDP-43 explains), most likely triggered by MetadataCleanupService running in each container.
  • There is no "Merge With Header" operation like PDP-43 asks for. The Transient Segment will be merged as-is using the regular merge command. It is up to the Client/Append Processor to add appropriate Event envelopes for the large event.
  • The section Writer Side is dropped.
  • The section Reader Side is dropped.
  • The section Event Serialization is dropped.

Segment Store Changes

  • NameUtils.java
    • Create API to properly name transient segments. To make this a lot easier, we should borrow from the scheme we use to name transactions, and we should make sure those names cannot possibly clash.
  • New Segment Type: Transient
    • This must be added as a "Role" to SegmentType.
    • A Segment may not be a TableSegment and Transient at the same time.
  • MetadataStore.java
    • If SegmentType.isTransient(), then verify the name of the segment matches the transient pattern (as defined in NameUtils.java) and DO NOT insert it into the Metadata Table. Instead, create a StreamSegmentMapOperation (with Pinned==true) and queue it up in the DurableLog. This ensures the segment is pinned to memory (will never be evicted) and we won't ever bother the Metadata Table with it.
  • SegmentMetadataUpdateTransaction.preProcessAttributes
    • Disallow extended Attributes (Attributes.isCoreAttribute(...) == false) for transient segments.
  • AppendProcessor
    • When receiving a CreateTransientSegment, generate a transient segment name using NameUtils, and create that segment on the store, making sure the SegmentType is set to transient (see above).
    • Add a close() method on AppendProcessor which is invoked when the connection closes and will in turn delete any open Transient segments.
    • TODO: we need to figure out how to handle the merging of this segment. Merge is currently handled by PravegaRequestProcessor, which has no connection to AppendProcessor. Since the name of the transient segment is not sent back to the client, the client may not know which segment to seal.
  • MetadataCleanupService
    • Every time it runs (even in force-runs due to memory pressure), collect all Transient Segments which haven't been used in a while and are not merged. Delete those segments.
    • The Metadata Cleanup service already has a way to track segments to evict. This may or may not be useful as-is; for Transient Segments we must ensure that a certain amount of time has elapsed since they were last used. We currently track if they are currently used by anyone and execute every 60 seconds - this may not be appropriate. To handle this, we may need another field on SegmentMetadata to keep track of last access time (in clock time). This time should be set to Long.MIN_VALUE after recoveries (to ensure speedy deletions of transient segments that haven't been merged).

Client and Wire Protocol Changes

Wire Protocol

  • A new wire command for CreateTransientSegment needs to be added. Additionally there should be a corresponding reply TransientSegmentCreated
    • Either the created would contrin the segment ID so that the subsequent SetupAppend could write data to it. Or:
    • Alternatively the Client may generate a unique name for the segment and send that name along. This would solve the PravegaRequestProcessor.merge problem discussed above.

Client

  • The Writer needs to detect if an incoming Event is greater than the allowed limit of 8MB. If so, it must follow the Large Event protocol
    1. CreateTransientSegment
    2. SetupAppend
    3. Append to transient segment
    4. Call flush on the target segment (to preserve order)
    5. Merge transient segment into the target segment
  • The writer need to ensure order. When a large event is detected, it must, under lock:
    1. create a new transient segment
    2. write the large event to the transient segment
    3. Flush target segment
    4. flush the transient segment
    5. merge the transient segment
    6. Wait for ack. (If a SegmentIsSealed error is instead returned 1-6 are repeated).
  • Reader
    • The reader will need to allocate enough memory to load the entire large Event.
Clone this wiki locally