-
Notifications
You must be signed in to change notification settings - Fork 408
PDP 20 (TXN Timeouts)
Status: Under discussion
Related issues:
Transactions enable writers to perform multiple writes to a stream atomically. The current API call to begin a transaction looks like this:
Transaction<Type> beginTxn(long transactionTimeout,
long maxExecutionTime,
long scaleGracePeriod);
The three parameters of the call are configuring different timeouts for a txn:
-
transactionTimeout
is a lease timeout. If no client pings the controller for this transaction within the specified period, then the txn aborts. -
maxExecutionTime
is the maximum amount of time a txn is allowed to remain open, independent of whether a client is pinging the controller for the given txn or not. -
scaleGracePeriod
is the maximum amount of time a txn is allowed to block a stream scaling event without aborting. Scaling events currently need to wait until outstanding txns complete before it can proceed. Consequently, choosing a larger value forscaleGracePeriod
means that stream scaling events can be blocked for a longer time waiting on a txn.
One problem with setting new txns this way is that it is difficult to choose the values as there are three different timeouts to reason about. The second problem and perhaps a more severe one is the choice of the scale grace period timeout as it has conflicting goals. A longer timeout is appropriate for correctness as the application does not want a txn timing out prematurely. A shorter time out benefits scaling as once we determine that it needs to happen, the application can have access to new segments sooner.
- Lease timeouts
- Scale grace period
- Maximum execution time
The main goal of txn leases is to enable the system to reclaim used resources quickly when a txn has been left open because, for example, the client crashed. This mechanism works by having the writer client pinging the txn periodically to keep the txn open.
The recommendation in this proposal is to keep the lease mechanism, but make the pings internal to the client. We recommend moving the configuration of the timeout to configuration rather than having it present on each API call.
We propose to remove the scale grace period by means of implementing rolling transactions. Rolling transactions do not block scaling events due to the presence of outstanding txns. The current grace period for stream scaling has the inconvenience described in motivation.
We propose to either remove this time out or move it to configuration.
- Contributing
- Guidelines for committers
- Testing
-
Pravega Design Documents (PDPs)
- PDP-19: Retention
- PDP-20: Txn Timeouts
- PDP-21: Protocol Revisioning
- PDP-22: Bookkeeper Based Tier-2
- PDP-23: Pravega Security
- PDP-24: Rolling Transactions
- PDP-25: Read-Only Segment Store
- PDP-26: Ingestion Watermarks
- PDP-27: Admin Tools
- PDP-28: Cross Routing Key Ordering
- PDP-29: Tables
- PDP-30: Byte Stream API
- PDP-31: End-to-End Request Tags
- PDP-32: Controller Metadata Scalability
- PDP-33: Watermarking
- PDP-34: Simplified-Tier-2
- PDP-35: Move Controller Metadata to KVS
- PDP-36: Connection Pooling
- PDP-37: Server-Side Compression
- PDP-38: Schema Registry
- PDP-39: Key-Value Tables Beta 1
- PDP-40: Consistent Order Guarantees for Storage Flushes
- PDP-41: Enabling Transport Layer Security (TLS) for External Clients
- PDP-42: New Resource String Format for Authorization
- PDP-43: Large Events
- PDP-44: Lightweight Transactions
- PDP-45: Health Check
- PDP-46: Read Only Permissions For Reading Data
- PDP-47: Pravega Consumption Based Retention
- PDP-48: Key-Value Tables Beta 2
- PDP-49: Segment Store Admin Gateway
- PDP-50: Stream Tags
- PDP-51: Segment Container Event Processor
- PDP-53: Robust Garbage Collection for SLTS
- PDP-54: Tier-1 Repair Tool
- PDP-55: New Reader API on segment level