Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for date, time_millis, and timestamp_millis to AvroColumnDecoder #13070
Add support for date, time_millis, and timestamp_millis to AvroColumnDecoder #13070
Changes from all commits
54dab35
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it is the correct way to test it. Since we pass the value to be serialized as long - Avro serializes/de-serializes it as long but as per specification it looks like for Java we would pass the input as
Instant
for Timestamp logical type - register the conversion forGeneircRecord
and serialize the data so when we de-serialize we would get them asInstant
instead of long (we could do the conversion manually). Similar mapping/conversion is implemented for Java classes generated by Avro.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeConversion can be added using similar snippets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. To support many and many cases, there's a lot of work to do. I don't think one PR should handle all of it or you can try it out. I'd say the patch here can resolve the specific issue of bumping trino version for Pulsar Trino Plugin.
What does it mean? If I get it right you suggest one more possible case avro timestamp can be represented as. Then my answer is as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree but I'm afraid if this is the correct approach. One major thing which bugs me is that we manually do the conversion from a long to a micro-second precision compatible format - but ideally it should happen as a part of Avro library and it does support it. And one additional benefit is that in case if Avro modifies their internal representation it shouldn't affect us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show me how Avro "does support it"? I'm glad to adapt such a way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method shows on how Avro implements it
https://github.com/apache/avro/blob/a2c111dc3581f9549b173f7c6a9073cfbf173558/lang/java/avro/src/test/java/org/apache/avro/generic/TestGenericLogicalTypes.java#L368
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'm looking into it this week :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Praveen2112 after a review I think the conversion is inevitable because Trino has it's own representation, introduced by:
I take a look at the TimeConvertions, but it seems to be used on the fly data in-memory processing, not during deserialization. Pulsar Trino Plugin users meet a use case that the actual payload is int/long according to the logical type, and there's no
Instant
orLocalTime
to be converted.I don't have a good idea to make further revision on this codepath. You're welcome to show a patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any update on this patch?
cc @Praveen2112 @kokosing @hashhar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about for time-micros ? I guess it is also an valid Avro logical type ? Similarly for timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can work. For the original purpose, I prepare this patch only to adapt Pulsar Trino Plugin's use case. Since the review process goes so long now, I'd prefer after a good result on the mills part and I may continue the micros part. Otherwise, it's extending the scope and I don't see an end.
BTW, Trino itself handles layout changes of TimestampType and TimeType in different PRs, so I think it's not a requirement we must do it at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree we can have different PRs for a dedicated datatype. But is there a way to differentiate between if the data type is of time-micros or time-mills. If a datatype is annotated with a LogicalType
time-micro
how do we restrict decoding that datatype ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the convertor's responsibility to pass the correct Trino type.
AvroColumnDecoder
accepts the Trino type the caller passes to it.