Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More SQL-like Substring #1635

Merged
merged 12 commits into from
Aug 3, 2018
Merged

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Jul 26, 2018

Description

Switch SUBSTRING from SUBSTRING(str, startIndex, endIndex) with zero-based indexing to SUBSTRING(str, pos, len) with:

  • one-based indexing
  • returns null for any null parameters
  • ignores out of bound indexes, (i.e. treats them as being just inbound)
    as used by other SQL providers.

Uses can set ksql.functions.substring.legacy.args to true, (either through server properties or session properties, to switch the implementation back to legacy mode, i.e. SUBSTRING(str, startIndex, endIndex) and error handling that throws a lot of exceptions.

Fixes #1634

Because I've extended the description of SubString, I've also improved the formatting of the output of the DESCRIBE FUNCTION x; in the CLI, so that long descriptions are correctly split across lines and properly indented. New output looks like:

ksql> describe function substring;

Name        : SUBSTRING
Author      : Confluent
Overview    : Returns a substring of the passed in value.
              The behaviour of this function changed in release 5.1. It is 
              possible to switch the function back to pre-v5.1 functionality via
               the setting:
              	ksql.functions.substring.legacy.args
              This can be set globally, through the server configuration file, 
              or per sessions or query via the set command.
Type        : scalar
Jar         : internal
Variations  : 

  Variation   : SUBSTRING(value VARCHAR, pos INT, len INT)
  Returns     : VARCHAR
  Description : Returns a substring of str that starts at pos and is of length 
                len
  value       : The source string. If null, then function returns null.
  pos         : The base-one position the substring starts from. If null, then 
                function returns null. (If in legacy mode, this argument is 
                base-zero)
  len         : The length of the substring to extract. If null, then function 
                returns null. (If in legacy mode, this argument is the endIndex 
                (exclusive), rather than the length

  Variation   : SUBSTRING(value VARCHAR, pos INT)
  Returns     : VARCHAR
  Description : Returns a substring of str that starts at pos  and continues to 
                the end of the string
  value       : The source string. If null, then function returns null.
  pos         : The base-one position the substring starts from. If null, then 
                function returns null. (If in legacy mode, this argument is 
                base-zero)

Related release notes added in this PR: https://github.com/confluentinc/docs/pull/1095

Testing done

Unit tests added and JSON based functional test.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@big-andy-coates big-andy-coates requested review from dguy, rmoff and a team July 26, 2018 09:10
@dguy
Copy link
Contributor

dguy commented Jul 26, 2018

This breaks compatibility with 4.1.x substring function

@big-andy-coates
Copy link
Contributor Author

This breaks compatibility with 4.1.x substring function

Well, good that its not a release issue. Bad that our SUBSTRING is different to everyone else :-/

Copy link
Contributor

@hjafarpour hjafarpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

public String substring(final String value, final int startIndex) {
return value.substring(startIndex);
@Udf(description = "Returns a substring of str that starts at pos "
+ "and continues to the end of the string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also document that the pos starts from 1 not 0.

+ " extends to the character at endIndex -1.")
public String substring(final String value, final int startIndex, final int endIndex) {
return value.substring(startIndex, endIndex);
@Udf(description = "Returns a substring of str that starts at pos and is of length len")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was expecting to see some shiny new @UdfParameter decoration here to help teh user know which arg is the length and which one the startPos ? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different branch. That's only available in master. This is targeted at 5.0. Though now its not release related we can change this to master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now on master branch - so annotations added!

@big-andy-coates big-andy-coates changed the base branch from 5.0.x to master July 30, 2018 13:39
# Conflicts:
#	ksql-engine/src/main/java/io/confluent/ksql/function/udf/string/Substring.java
…s can control via the configuration `ksql.functions.substring.legacy.args`.
@big-andy-coates big-andy-coates requested a review from a team August 1, 2018 23:48
@@ -187,6 +200,12 @@ private static ConfigDef configDef(final boolean current) {
"",
ConfigDef.Importance.LOW,
KSQL_OUTPUT_TOPIC_NAME_PREFIX_DOCS
).define(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add this to COMPATIBILITY_BREAKING_CONFIG_DEFS on line 119 instead and set the write default to True and the read default to False. That way new queries can use the new SUBSTR behavior, and any existing queries will use the old behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, cool. Not noticed that before.

I've added it to COMPATIBILITY_BREAKING_CONFIG_DEFS and removed it from the main def list.

@Override
public void configure(final Map<String, ?> props) {
final boolean legacyArgs =
getProps(props, KSQ_FUNCTIONS_PROPERTY_PREFIX + "substring.legacy.args", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use KSQL_FUNCTIONS_SUBSRTRING_LEGACY_ARGS_CONFIG here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

private static int getStartIndex(final String value, final Integer pos) {
return pos < 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pos is 0 the substr call will throw an IndexOutOfBoundsException. Is that what we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot! Test case added and code fixed.

@big-andy-coates
Copy link
Contributor Author

@rodesai thanks for the review! I've pushed some changes to address your comments.

Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@big-andy-coates big-andy-coates merged commit 5455868 into confluentinc:master Aug 3, 2018
@big-andy-coates big-andy-coates deleted the substring branch August 3, 2018 12:26
big-andy-coates added a commit to big-andy-coates/ksql that referenced this pull request Aug 14, 2018
…y, so as not to be a breaking change.

Original code change is in confluentinc#1635.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants