Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add offset, timstamp and partition filter pushdown to Kafka (seek) #4546

Closed
gschmutz opened this issue Jul 23, 2020 · 2 comments
Closed

Add offset, timstamp and partition filter pushdown to Kafka (seek) #4546

gschmutz opened this issue Jul 23, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@gschmutz
Copy link

Currently the Kafka connector does not provide query filter push-down based on offset, partition or timestamp filter.

It would be nice to have the following filter options pushed-down to a Kafka seek operation:

  • _partition_offset between start_offset and end_offset
  • _partition_offset between start_timestamp and end_timestamp (where the timestamps are translated into offsets by using the Kafka offsetsForTimes method
  • _partition_id equals to one or more partition ids

Additionally the timestamp header field of a Kafka record should be exposed as an internal field (i.e. _timestamp).

The semantic of _timestamp is not the same as the 2nd filter option shown above. So I don't think it's correct to use such a new internal field to implement the 2nd option, such as WHERE _timestamp BETWEEN start_timestamp AND end_timestamp. It would be better to use another internal field, like _timestamp_offset for that.
The reason for that is, that the timestamp header can be set by the Kafka producer client to the real "event time", which can also be a timestamp in the past, if it is a disconnected client. So when the message is written to Kafka, it gets an offset at "ingestion time", which is not the same as the "event time" in that case. A push-down using a seek can only be done on the "ingestion time" (by translating the time to an offset) and never on the "event time". But of course it would still be good to also allow a where clause on the "event time", i.e. _timestamp.

@findepi findepi added the enhancement New feature or request label Jul 23, 2020
@wangli-td
Copy link
Contributor

@findepi please help review #4805

@hashhar
Copy link
Member

hashhar commented Feb 22, 2024

#4805 was merged and as of today we have pushdown for _partition, _offset. _timestamp also can be pushed down based on how Kafka and the connector is configured. See https://trino.io/docs/current/connector/kafka.html#kafka-timestamp-upper-bound-force-push-down-enabled

@hashhar hashhar closed this as completed Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

4 participants