-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add write support for kafka connector #4230
Conversation
a119b37
to
70648c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed part of it and it looks good! I made a few comments, I will continue reviewing tomorrow.
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaMetadata.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSink.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSink.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaHandleResolver.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSinkProvider.java
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSinkProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/EncoderColumnHandle.java
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/avro/AvroRowEncoder.java
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/csv/CsvColumnEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/csv/CsvRowEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/avro/AvroRowEncoder.java
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/EncoderColumnHandle.java
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSinkProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSink.java
Outdated
Show resolved
Hide resolved
70648c3
to
7bd5d93
Compare
Made revisions suggested by @aalbu
|
cc @elonazoulay |
d441a7b
to
b4bd592
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few random comments, I need to get back it later
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaFieldValueProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaFieldValueProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaFieldValueProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaInsertTableHandle.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaInsertTableHandle.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaOutputTableHandle.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/csv/CsvRowEncoder.java
Show resolved
Hide resolved
...fka/src/main/java/io/prestosql/plugin/kafka/encoder/json/CustomDateTimeJsonFieldEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/dummy/DummyRowEncoder.java
Outdated
Show resolved
Hide resolved
...fka/src/main/java/io/prestosql/plugin/kafka/encoder/json/CustomDateTimeJsonFieldEncoder.java
Outdated
Show resolved
Hide resolved
...fka/src/main/java/io/prestosql/plugin/kafka/encoder/json/CustomDateTimeJsonFieldEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/serdes/csv/KafkaCsvSerializer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(skimming)
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/EncoderColumnHandle.java
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/avro/AvroRowEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/PlainTextKafkaProducerFactory.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/PlainTextKafkaProducerFactory.java
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaProducerFactory.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaFieldValueProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/EncoderErrorCode.java
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/EncoderFieldValueProvider.java
Outdated
Show resolved
Hide resolved
...fka/src/main/java/io/prestosql/plugin/kafka/encoder/json/CustomDateTimeJsonFieldEncoder.java
Outdated
Show resolved
Hide resolved
One non-obvious thing to me is about: i assume @charlesjmorgan @aalbu you considered that, so perhaps there is an answer ready. |
@findepi Recently we changed the encoder to be Kafka specific so instead of returning a I'm not sure what the best option is, keeping encoding and decoding separate or combining them. I'm also not sure if it would be better to have a one-size-fits-all encoder/decoder or make specific ones for each connector that might be optimized for each specific use case. Any thoughts you (or anyone else) have on this would be helpful. |
Perhaps. I advocated for an iterative approach, where we implement the functionality for the connector at hand and then we can consider some abstractions that are more generic. I feel that trying to generalize too early can lead to constraining decisions.
So basically |
that's a very valid question. Since encoding seems pretty symmetric to decoding, it seems justified to bundle them in single class. I understand that, to realize this symmetry we would need kafka-independent types in the encoders interfaces.
Because that's just convenient. You need to plug flexible interface when cost of copying data is large (ie data is large). @losipiuk please chime in, since you were working with this code more than I did. |
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaProducerFactory.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSinkProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaInsertTableHandle.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/PlainTextKafkaProducerFactory.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review.
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaFieldValueProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSink.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaPageSinkProvider.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/KafkaProducerFactory.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/avro/AvroColumnEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/dummy/DummyRowEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/raw/RawColumnEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/raw/RawColumnEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/raw/RawColumnEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/raw/RawColumnEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/avro/AvroColumnEncoder.java
Outdated
Show resolved
Hide resolved
presto-kafka/src/main/java/io/prestosql/plugin/kafka/encoder/avro/AvroColumnEncoder.java
Outdated
Show resolved
Hide resolved
But what's the value of that? There isn't much code reuse. It's not an intuitive way of working with target systems. We might make Kafka work with it, but do we know that it's a good fit for other systems? Redis is using the |
This isn't about code reuse, but more about physically bundling things that are coupled.
I don't know. IF the decoding was accepting |
I agree that bundling encoders and decoders in single class would make reading and understanding implementation easier. Though it will push us more towards reusing same logic in very much different connectors. I think the fact that Kafka and Redis are sharing decoder implementation does more harm than good (see example @findepi gave above). I think (if we find resources for that) we should untangle the situation and do either: a) rework presto-decoder module to make it more connector agnostic and keep it reused where we currently use it. Then we can combine b) stop using presto-decoder in connectors it does not play well with (Redis?). This can potentially result in dropping presto-decoder at all if it looks like having separate implementation for each connector is better. Then we can combine decoder and encoder logic in Kafka connector module. Short term probably it does not matter much. At least for this PR I would keep them separate so we can focus on merging it sooner. Then as next step (we should not postpone that) we can work on refactor and move it towards either a) or b). WDYT? |
FWIW, the
@losipiuk not sure what you mean here. How does separate classes vs extending existing classes matter? FYI i do not have very strong opinion either way, but i expect it may be some work to change between approaches. |
It matters in terms how many rounds of review we need. I would prefer to not bloat this PR with extra refactorings. Smaller PRs are much easier to review and work with. And sooner we merge this beast and continue with smaller gradual changes the better IMO. |
b4bd592
to
acdfe1b
Compare
Thank you @aalbu @findepi @losipiuk and @kokosing for all your feedback, it has been very helpful! I hope I got everything (I triple checked so should be good). I intentionally left out changes that I didn't think were in the scope of this PR, I will revisit those in the future. I am going to split this pr into 5 parts. The first will be basic functionality for inserts and the four after that will each be for a specific encoder format. lmk if you think I should split it up differently Base functionality/CSV encoder PR - #4287
|
Add write support for the kafka connector
Add encoder to serialize message into avro, csv, json, and raw formats (works for primitives and json date/time types)
Currently some changes proposed in #4183 are included in this pr, but once those get merged I'll rebase (done)
Closes #3980