-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support event time with Pravega Streams #191
Comments
Speaking to what can be done in the connector itself, have a look at FLINK-3375 for a rationale as to why some connectors support a per-partition assigner. For Pravega, segments are unlikely to contain strictly-ascending event timestamps, correct? The routing key is more likely to but that doesn't seem relevant here. |
yeah @EronWright without ascending event timestamps (atleast at a per segment level), it will be difficult to emit flink watermarks from the source connectors. We will have to keep this in mind too when we review the design for event time support. |
The current thinking around time domains is the following: 1- Event time is better handled by the application. We currently do not plan to incorporate time in our API. |
We have addressed the notion of time association with events as part of PDP 33 (https://github.com/pravega/pravega/wiki/PDP-33:-Watermarking). |
Pravega currently supports processing time seeks over a stream. Ideally, we should also support event time seeks so that applications can start processing streamed data using event rather processing time. See here for an informal definition of processing vs. event time:
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
The general idea is that events might be generated way before they are actually produced into a stream and applications often are interested in generation time references rather than the time the event has been produced into the streaming system.
To support this feature, we need to support at least the application passing a timestamp when it produces an event. Internally, the controller needs to map event time to segments. Note that events might not be ordered with respect to event time when they are produced. Consequently, they aren't guaranteed to be in monotonically increasing order. To have them in monotonically increasing order, we can do one of the following:
1- Enforce it using a technique like the one in KIP-32
2- Bucketize events according to timestamps, each bucket being a segment or set of segments
3- Sort the events
Option 3 sounds like a bad idea because it could be expensive. Options 1 and 2 seem reasonable. Any other approach worth considering?
The text was updated successfully, but these errors were encountered: