Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kafka Json encoder #4477

Merged
merged 1 commit into from
Jul 31, 2020
Merged

Conversation

charlesjmorgan
Copy link
Member

@charlesjmorgan charlesjmorgan commented Jul 16, 2020

Add JsonRowEncoder and JsonRowEncoderFactory
Add test case in io.prestosql.plugin.kafka.TestKafkaIntegrationSmokeTest#testRoundTripAllFormats

Might it be better to have separate formatters for the date/time types rather than methods in JsonRowEncoder?

closes #3980

@charlesjmorgan
Copy link
Member Author

Implemented all the changes in the resolved review comments, will make it through the rest on Monday. I changed the date/time formatting to precompile a list of functions to format the values, lmk what you think of this.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I left some comments but is not full review. Structurally it looks much better. Timestamp handling is hard as I honestly am not sure what is the expected behaviour here.

@charlesjmorgan
Copy link
Member Author

charlesjmorgan commented Jul 23, 2020

Changes:

  • Restructure Json Date/Time formatters according to the suggestion made by @losipiuk
  • Format functions return a parameterized Function
  • Add (better) unit tests for format functions
  • Restructure Json Date/Time round trip tests in TestKafkaIntegrationSmokeTest for readability

@charlesjmorgan charlesjmorgan force-pushed the kafka-json-encoder branch 3 times, most recently from 955c7f6 to 30b4b63 Compare July 27, 2020 17:59
@charlesjmorgan
Copy link
Member Author

charlesjmorgan commented Jul 27, 2020

I think I made the suggested legacy timestamp changes. The legacy format functions definitely need a close look at tho

@charlesjmorgan charlesjmorgan force-pushed the kafka-json-encoder branch 2 times, most recently from fdcb87f to 33f8a82 Compare July 27, 2020 21:03
Copy link
Member

@aalbu aalbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy semantics imply that that temporal types are interpreted in the session’s time zone. See for example the Javadoc on TimestampType. It seems to me that throughout these changes, the UTC timezone is used instead.

Copy link
Member

@aalbu aalbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly stylistic suggestions.

if (type == DATE) {
return JsonFormatFunction.builder().setFormatDateFunc(getDateFormatFunction(dataFormat, formatHint)).build();
}
else if (type == TIME) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually don't need all the elses.

public static Function<Long, String> formatDateFunc(String formatHint)
{
try {
DateTimeFormatter formatter = DateTimeFormat.forPattern(formatHint).withLocale(Locale.ENGLISH).withZoneUTC();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a static import (below too).

{
try {
DateTimeFormatter formatter = DateTimeFormat.forPattern(formatHint).withLocale(Locale.ENGLISH).withZoneUTC();
return millis -> (new DateTime(millis, DateTimeZone.UTC)).toString(formatter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be consistent in the way we format. Either new DateTime(...).toString(fromatter) as here, or formatter.print(new DateTime(...)) as below.


public static class Builder
{
private Function<Long, String> formatDateFunc = (ignored) -> { throw new RuntimeException("unsupported argument type"); };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could define

private static final Function<Long, String> UNIMPLEMENTED = ignored -> { throw new RuntimeException("unsupported argument type"); };

and reference that:

private Function<Long, String> formatDateFunc = UNIMPLEMENTED;
private Function<Long, String> formatTimeFunc = UNIMPLEMENTED;
...


public static long daysToEpochMillis(long value)
{
return DAYS.toMillis(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is subjective, but you could just as well inline. I think DAYS.toMillis(value) reads pretty well.


public static Function<Long, String> formatMillisWithTZFunc()
{
return encodedMillisWithTZ -> String.valueOf(unpackMillisUtc(encodedMillisWithTZ));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should not support this scenario. This does not seem a meaningful return value - without the tz info, the milliseconds don't make much sense here, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was wrong. This format implies millis since epoch.

unpackZoneKey(encodedMillisWithTZ).getZoneId()));
}

private static LocalDateTime localDateTimeOfEpochMillis(long epochMillis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this would warrant being public in JsonFormatFunctions. It's being used in ISO8601FormatFunctions, as well.

return millis -> String.valueOf(millisToSeconds(millis));
}

public static Function<Long, String> formatSecondsWithTZFunc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another situation in which we're throwing away tz info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like unix time(), it's fine.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimming.

i have several code style-level comments whch are generally applicable to other place as well.

i have some comemnts abut timestamp formatting semantics.
lets talk about them more.

checkArgument(isSupportedType(columnHandle.getType()), "Unsupported column type '%s' for column '%s'", columnHandle.getType(), columnHandle.getName());

if (isDateTimeType(columnHandle.getType())) {
checkArgument(columnHandle.getDataFormat() != null, "Unsupported or no dataFormat '%s' defined for temporal column '%s'", columnHandle.getDataFormat(), columnHandle.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
checkArgument(columnHandle.getDataFormat() != null, "Unsupported or no dataFormat '%s' defined for temporal column '%s'", columnHandle.getDataFormat(), columnHandle.getName());
checkArgument(columnHandle.getDataFormat() != null, "No dataFormat defined for temporal column '%s'", columnHandle.getName());

else if (type == TIME) {
return JsonFormatFunction.builder().setFormatTimeFunc(getTimeFormatFunction(dataFormat, formatHint)).build();
}
else if (type == TIME_WITH_TIME_ZONE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
private CustomDateTimeFormatFunctions() {}

public static Function<Long, String> formatDateFunc(String formatHint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid abbreviations

func -> function

public static Function<Long, String> formatDateFunc(String formatHint)
{
try {
DateTimeFormatter formatter = DateTimeFormat.forPattern(formatHint).withLocale(Locale.ENGLISH).withZoneUTC();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the format hint documented to be Joda Time?

DateTimeFormatter formatter = DateTimeFormat.forPattern(formatHint).withLocale(Locale.ENGLISH).withZoneUTC();
return millis -> (new DateTime(millis, DateTimeZone.UTC)).toString(formatter);
}
catch (IllegalArgumentException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is correct but I would find it more readable if the lambda was outside of try block...

{
try {
DateTimeFormatter formatter = DateTimeFormat.forPattern(formatHint).withLocale(Locale.ENGLISH).withZoneUTC();
return millis -> (new DateTime(millis, DateTimeZone.UTC)).toString(formatter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to supplement the session zone here.
This would make it more similar to the non-legacy case, when we implement it.

Ie
INSERT ,.. TIMESTAMP 'some_date_time'
would be some_date_time session_zone in legacy
and some_date_time in non-legacy

and if the format hint does not contain the zone,
they would actually be the same.

{
try {
DateTimeFormatter formatter = DateTimeFormat.forPattern(formatHint).withLocale(Locale.ENGLISH).withZoneUTC();
return encodedMillisWithTZ -> formatter.print(new DateTime(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodedMillisWithTZ -> value

DateTimeZone.forID(unpackZoneKey(encodedMillisWithTZ).getId())));
}
catch (IllegalArgumentException e) {
throw new IllegalArgumentException(format("Invalid joda pattern '%s' passed as format hint", formatHint), e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

joda -> Joda Time

return formatTimeFunc.apply(value);
}

public String formatTimeWithTZ(long value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withTZ -> withTimeZone

@charlesjmorgan
Copy link
Member Author

removed temporal column support from this pr, will open another pr for that

Copy link
Member

@aalbu aalbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@findepi findepi merged commit 85204c1 into trinodb:master Jul 31, 2020
@findepi
Copy link
Member

findepi commented Jul 31, 2020

Merged, thanks!

@findepi findepi added this to the 340 milestone Jul 31, 2020
@findepi findepi mentioned this pull request Jul 31, 2020
8 tasks
@charlesjmorgan charlesjmorgan deleted the kafka-json-encoder branch June 7, 2021 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add write support for Kafka connector
4 participants