Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support event time with Pravega Streams #191

Closed
fpj opened this issue Nov 22, 2016 · 4 comments
Closed

Support event time with Pravega Streams #191

fpj opened this issue Nov 22, 2016 · 4 comments
Assignees
Labels
kind/feature New feature that should be added

Comments

@fpj
Copy link
Contributor

fpj commented Nov 22, 2016

Pravega currently supports processing time seeks over a stream. Ideally, we should also support event time seeks so that applications can start processing streamed data using event rather processing time. See here for an informal definition of processing vs. event time:

https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

The general idea is that events might be generated way before they are actually produced into a stream and applications often are interested in generation time references rather than the time the event has been produced into the streaming system.

To support this feature, we need to support at least the application passing a timestamp when it produces an event. Internally, the controller needs to map event time to segments. Note that events might not be ordered with respect to event time when they are produced. Consequently, they aren't guaranteed to be in monotonically increasing order. To have them in monotonically increasing order, we can do one of the following:

1- Enforce it using a technique like the one in KIP-32
2- Bucketize events according to timestamps, each bucket being a segment or set of segments
3- Sort the events

Option 3 sounds like a bad idea because it could be expensive. Options 1 and 2 seem reasonable. Any other approach worth considering?

@EronWright
Copy link
Contributor

EronWright commented Apr 12, 2017

Speaking to what can be done in the connector itself, have a look at FLINK-3375 for a rationale as to why some connectors support a per-partition assigner. For Pravega, segments are unlikely to contain strictly-ascending event timestamps, correct? The routing key is more likely to but that doesn't seem relevant here.

@skrishnappa
Copy link
Contributor

yeah @EronWright without ascending event timestamps (atleast at a per segment level), it will be difficult to emit flink watermarks from the source connectors. We will have to keep this in mind too when we review the design for event time support.

@fpj
Copy link
Contributor Author

fpj commented Mar 20, 2018

The current thinking around time domains is the following:

1- Event time is better handled by the application. We currently do not plan to incorporate time in our API.
2- There are PDPs, like 26 and 28, that focus on ingestion time and some notion of weak ordering across segments to enable more efficient processing of events

@EronWright EronWright removed this from the Sprint 9 milestone May 10, 2019
@shiveshr
Copy link
Contributor

shiveshr commented Oct 3, 2019

We have addressed the notion of time association with events as part of PDP 33 (https://github.com/pravega/pravega/wiki/PDP-33:-Watermarking).

@shiveshr shiveshr closed this as completed Oct 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature that should be added
Projects
None yet
Development

No branches or pull requests

7 participants