Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added stringtodate and datetostring UDFs #1890

Merged
merged 2 commits into from
Oct 15, 2018

Conversation

andybryant
Copy link
Contributor

Description

Added stringtodate and datetostring UDFs to convert to and from a formatted date string and an integer representing days since epoch. This format is used by Kafka Connect to represent Date objects with no time or time zone component.

In a separate commit on the same PR I also updated stringtotimestamp and timestamptostring to use the new UDF api and to support dynamically supplied format patterns.

Testing done

Added tests for all new functionality and ran locally. Generated and views updated docs locally.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@andybryant andybryant requested review from JimGalasyn and a team as code owners September 14, 2018 03:19
@ghost
Copy link

ghost commented Sep 14, 2018

@confluentinc It looks like @andybryant just signed our Contributor License Agreement. 👍

Always at your service,

clabot

Copy link
Member

@JimGalasyn JimGalasyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@andybryant andybryant force-pushed the feature-datetostring-udf branch from c07fdb5 to 35c3507 Compare September 16, 2018 23:33
@andybryant andybryant force-pushed the feature-datetostring-udf branch 3 times, most recently from adb5cc0 to 780fae4 Compare September 27, 2018 04:54
@@ -375,7 +375,7 @@ public void shouldListFunctions() {

// not going to check every function
assertThat(functionList.getFunctions(), hasItems(
new SimpleFunctionInfo("TIMESTAMPTOSTRING", FunctionType.scalar),
new SimpleFunctionInfo("EXTRACTJSONFIELD", FunctionType.scalar),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions that use the new Udf annotation don't show up in this list. Changing the test to use another Kudf based function.

@andybryant andybryant force-pushed the feature-datetostring-udf branch from 780fae4 to 1c4291a Compare September 27, 2018 06:23
Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andybryant, this is really cool that you've taken the time to raise this.

Sorry for the delay in getting it reviewed.

I've left a few comments and suggestions. Take a look and see what you think.

Thanks,

Andy

@@ -1071,6 +1071,14 @@ Scalar functions
+------------------------+------------------------------------------------------------+---------------------------------------------------+
| CONCAT | ``CONCAT(col1, '_hello')`` | Concatenate two strings. |
+------------------------+------------------------------------------------------------+---------------------------------------------------+
| DATETOSTRING | ``DATETOSTRING(START_DATE, 'yyyy-MM-dd')`` | Converts an INT date value into |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Converts an integer representation of a date into..."

@@ -1140,6 +1148,12 @@ Scalar functions
+------------------------+------------------------------------------------------------+---------------------------------------------------+
| ROUND | ``ROUND(col1)`` | Round a value to the nearest BIGINT value. |
+------------------------+------------------------------------------------------------+---------------------------------------------------+
| STRINGTODATE | ``STRINGTODATE(col1, 'yyyy-MM-dd')`` | Converts a string value in the given |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Converts a string representation of a date in..."

" into an integer representing the days since epoch."

import io.confluent.ksql.util.timestamp.StringToTimestampParser;

public class StringToTimestamp implements Kudf {
@UdfDescription(name = "stringtotimestamp", author = "Confluent",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Converts a string representation of a date ..."

(You're wording on the new UDFs is much better than the wording on the old ones!)

if (timestampParser == null) {
timestampParser = new StringToTimestampParser(args[1].toString());
}
@Udf(description = "Converts a string value in the given format into the BIGINT value"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Converts a string representation of a date..."

+ " days since epoch using the given format pattern."
+ " The format pattern should be in the format expected by"
+ " java.time.format.DateTimeFormatter")
public int stringToDate(final String formattedDate, final String formatPattern) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you look to make use of @UdfParameter on all of your UDF params please. The actual names of the fields are not currently used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using @UdfParameter for all 4 UDFs

@Test
public void shouldThrowIfFormatInvalid() {
expectedException.expect(KsqlFunctionException.class);
expectedException.expect(UncheckedExecutionException.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exception type again.

}

@Test
public void shouldBeThreadSafe() {
public void shouldWorkWithManyDifferentFormatters() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combine with above again.

"CREATE STREAM TS AS select id, stringtodate(date, format) as ts from test;"
],
"inputs": [
{"topic": "test_topic", "key": 0, "value": "0,zero,2018-05-11,yyyy-MM-dd", "timestamp": 0},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a literal in one of these formats please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,31 @@
{
"comments": [
"You can specify multiple statements per test case, i.e., to set up the various streams needed",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop this cut & paste comments section if you don't mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (in both json files)

"CREATE STREAM DATE_STREAM AS select ID, datetostring(START_DATE, DATE_FORMAT) as CUSTOM_FORMATTED_START_DATE from test;"
],
"inputs": [
{"topic": "test_topic", "key": 1, "value": {"ID": 1, "START_DATE": 17662, "DATE_FORMAT": "yyyy-MM-dd"}, "timestamp": 0},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a literal in one of these formats please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last one is literal

@andybryant andybryant force-pushed the feature-datetostring-udf branch from 1c4291a to bea71d0 Compare October 4, 2018 00:49
@andybryant
Copy link
Contributor Author

andybryant commented Oct 4, 2018

Hi @big-andy-coates

Thanks for the feedback - lots of good ideas there. I've pushed another commit that I think addresses all your comments. I kept the 3 separate commits for now to make reviewing easier, but can squash them down to one commit once your happy with it.

Cheers
Andy

@andybryant andybryant force-pushed the feature-datetostring-udf branch 3 times, most recently from da8d37e to d999593 Compare October 4, 2018 01:35
Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andybryant LGTM.

Sorry for the delay in reviewing!

Just need another engineer to double check. I'll look to get this cherry picked into 5.1. release.

public String dateToString(
@UdfParameter(value = "epochDays",
description = "The Epoch Day to convert,"
+ " based on the epoch 1970-01-01") final Integer epochDays,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use int rather than Integer as it'll throw an NPE on null anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andybryant for submitting this PR! I've left one question, but otherwise LGTM

final Timestamp timestamp = new Timestamp(epochMilli);
final DateTimeFormatter formatter = formatters.get(formatPattern);
return timestamp.toInstant()
.atZone(ZoneId.systemDefault())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to use the system timezone? i.e., dates are all UTC in KSQL. So this is deviating from the norm. if this is indeed the case that we want this, then can you update the description etc to make it clear what is happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old StringToTimestamp implementation used system default time zone when no time zone was provided. I can change to UTC and update the docs, but wouldn't this be considered a breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've assumed we want to keep the old behaviour for now and have updated the docs accordingly.

@andybryant andybryant force-pushed the feature-datetostring-udf branch 3 times, most recently from e49e2dc to c5df215 Compare October 11, 2018 05:43
@andybryant
Copy link
Contributor Author

Fixed up docs based on @dguy feedback and switched DateToString to use int for epochDays.

@dguy
Copy link
Contributor

dguy commented Oct 11, 2018

Thanks @andybryant can you please merge with the latest master. Assuming it passes we can then merge it. Thanks for your patience

@andybryant andybryant force-pushed the feature-datetostring-udf branch from c5df215 to 0b98da1 Compare October 14, 2018 09:34
@andybryant
Copy link
Contributor Author

@dguy - updated to latest master. Cheers Andy

@dguy dguy merged commit 2ff99fa into confluentinc:master Oct 15, 2018
@dguy
Copy link
Contributor

dguy commented Oct 15, 2018

Thanks for the contribution @andybryant!

@andybryant
Copy link
Contributor Author

Awesome - thanks @dguy
@big-andy-coates - created a separate PR #2056 with the two commits cherry-picked for the 5.1.x branch in case there's still a possibility of getting it into that release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants