Skip to content

Commit

Permalink
More SQL-like Substring (#1635)
Browse files Browse the repository at this point in the history
* More SQL-like SUBSTRING

Fixes #1634

SUBSTRING is now more SQL compliant. It has a legacy mode, which users can control via the  configuration `ksql.functions.substring.legacy.args`.
  • Loading branch information
big-andy-coates authored Aug 3, 2018
1 parent 7019545 commit 5455868
Show file tree
Hide file tree
Showing 12 changed files with 711 additions and 105 deletions.
17 changes: 15 additions & 2 deletions docs/syntax-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1058,8 +1058,21 @@ Scalar functions
| | | quotes in the timestamp format can be escaped with|
| | | '', for example: 'yyyy-MM-dd''T''HH:mm:ssX'. |
+------------------------+------------------------------------------------------------+---------------------------------------------------+
| SUBSTRING | ``SUBSTRING(col1, 2, 5)`` | Return the substring with the start and end |
| | | indices. |
| SUBSTRING | ``SUBSTRING(col1, 2, 5)`` | ``SUBSTRING(str, pos, [len]``. |
| | | Return a substring of ``str`` that starts at |
| | | ``pos`` and had length ``len``, or continues to |
| | | the end of the string. |
| | | |
| | | NOTE: prior to v5.1 of KSQL the syntax was: |
| | | ``SUBSTRING(str, start, [end]`` |
| | | Where ``start`` and ``end`` where base-zero |
| | | indexes to start (inclusive) and end (exclusive) |
| | | the substring. |
| | | |
| | | It is possible to switch back to this legacy mode |
| | | by setting |
| | | ``ksql.functions.substring.legacy.args`` to |
| | | ``true`` |
+------------------------+------------------------------------------------------------+---------------------------------------------------+
| TIMESTAMPTOSTRING | ``TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS')`` | Converts a BIGINT millisecond timestamp value into|
| | | the string representation of the timestamp in |
Expand Down
22 changes: 10 additions & 12 deletions docs/udf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,9 @@ Conversely, using boxed types indicates the function can accept null values for
It is up to the implementor of the UDF to chose which is the most appropriate.
A common pattern is to return ``null`` if the input is ``null``, though generally this is only for
parameters that are expected to be supplied from the source row being processed. For example,
a ``substring(String value, int beginIndex)`` UDF might return null if ``value`` is null, but a
null ``beginIndex`` parameter would be treated as an error, and hence should be a primitive.
a ``substring(String str, int pos)`` UDF might return null if ``str`` is null, but a
null ``pos`` parameter would be treated as an error, and hence should be a primitive.
(In actual fact, the in-built substring is more lenient and would return null if pos was null).

The return type of a UDF can also be a primitive or boxed type. A primitive return type indicates
the function will never return ``null``, where as a boxed type indicates it may return ``null``.
Expand Down Expand Up @@ -129,15 +130,12 @@ of the UDF does, for example:

.. code:: java
@Udf(description = "Returns a string that is a substring of this string. The"
+ " substring begins with the character at the specified startIndex and"
+ " extends to the end of this string.")
public String substring(final String value, final int startIndex)
@Udf(description = "Returns a substring of str that starts at pos"
+ " and continues to the end of the string")
public String substring(final String str, final int pos)
@Udf(description = "Returns a string that is a substring of this string. The"
+ " substring begins with the character at the specified startIndex and"
+ " extends to the character at endIndex -1.")
public String substring(final String value, final int startIndex, final int endIndex)
@Udf(description = "Returns a substring of str that starts at pos and is of length len")
public String substring(final String str, final int pos, final int len)
UdfParameter Annotation
~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -154,8 +152,8 @@ can be used to better describe what the parameter does, for example:
@Udf
public String substring(
@UdfParameter("Value") final String value,
@UdfParameter(value = "Value", description = "Zero based start index") final int startIndex)
@UdfParameter("str") final String str,
@UdfParameter(value = "pos", description = "Starting position of the substring") final int pos)
Configurable UDF
~~~~~~~~~~~~~~~~
Expand Down
79 changes: 62 additions & 17 deletions ksql-cli/src/main/java/io/confluent/ksql/cli/console/Console.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.util.StringTokenizer;
import org.apache.commons.lang3.StringUtils;
import org.jline.reader.EndOfFileException;
import org.jline.terminal.Terminal;
Expand Down Expand Up @@ -700,19 +701,21 @@ private void printQueryDescriptionList(final QueryDescriptionList queryDescripti

private void printFunctionDescription(final FunctionDescriptionList describeFunction) {
final String functionName = describeFunction.getName().toUpperCase();
writer().printf("%-12s: %s%n", "Name", functionName);
final String baseFormat = "%-12s: %s%n";
final String subFormat = "\t%-12s: %s%n";
writer().printf(baseFormat, "Name", functionName);
if (!describeFunction.getAuthor().trim().isEmpty()) {
writer().printf("%-12s: %s%n", "Author", describeFunction.getAuthor());
writer().printf(baseFormat, "Author", describeFunction.getAuthor());
}
if (!describeFunction.getVersion().trim().isEmpty()) {
writer().printf("%-12s: %s%n", "Version", describeFunction.getVersion());
writer().printf(baseFormat, "Version", describeFunction.getVersion());
}
if (!describeFunction.getDescription().trim().isEmpty()) {
writer().printf("%-12s: %s%n", "Overview", describeFunction.getDescription());
}
writer().printf("%-12s: %s%n", "Type", describeFunction.getType().name());
writer().printf("%-12s: %s%n", "Jar", describeFunction.getPath());
writer().printf("%-12s: %n", "Variations");

printDescription(baseFormat, "Overview", describeFunction.getDescription());

writer().printf(baseFormat, "Type", describeFunction.getType().name());
writer().printf(baseFormat, "Jar", describeFunction.getPath());
writer().printf(baseFormat, "Variations", "");
final Collection<FunctionInfo> functions = describeFunction.getFunctions();
functions.forEach(functionInfo -> {
final String arguments = functionInfo.getArguments().stream()
Expand All @@ -722,18 +725,60 @@ private void printFunctionDescription(final FunctionDescriptionList describeFunc
.collect(Collectors.joining(", "));

writer().printf("%n\t%-12s: %s(%s)%n", "Variation", functionName, arguments);
writer().printf("\t%-12s: %s%n", "Returns", functionInfo.getReturnType());
if (!functionInfo.getDescription().trim().isEmpty()) {
writer().printf("\t%-12s: %s%n", "Description", functionInfo.getDescription());
}

functionInfo.getArguments().stream()
.filter(a -> !a.getDescription().trim().isEmpty())
.forEach(a -> writer().printf("\t%-12s: %s%n", a.getName(), a.getDescription()));

writer().printf(subFormat, "Returns", functionInfo.getReturnType());
printDescription(subFormat, "Description", functionInfo.getDescription());
functionInfo.getArguments()
.forEach(a -> printDescription(subFormat, a.getName(), a.getDescription()));
}
);
}

private void printDescription(final String format, final String name, final String description) {
final String trimmed = description.trim();
if (trimmed.isEmpty()) {
return;
}

final int labelLen = String.format(format.replace("%n", ""), name, "")
.replace("\t", " ")
.length();

final int width = Math.max(getWidth(), 80) - labelLen;

final String fixedWidth = splitLongLine(trimmed, width);

final String indent = String.format("%-" + labelLen + "s", "");

final String result = fixedWidth
.replace(System.lineSeparator(), System.lineSeparator() + indent);

writer().printf(format, name, result);
}

private static String splitLongLine(final String input, final int maxLineLength) {
final StringTokenizer spaceTok = new StringTokenizer(input, " \n", true);
final StringBuilder output = new StringBuilder(input.length());
int lineLen = 0;
while (spaceTok.hasMoreTokens()) {
final String word = spaceTok.nextToken();
final boolean isNewLineChar = word.equals("\n");

if (isNewLineChar || lineLen + word.length() > maxLineLength) {
output.append(System.lineSeparator());
lineLen = 0;

if (isNewLineChar) {
continue;
}
}

output.append(word);
lineLen += word.length();
}
return output.toString();
}

private void printAsJson(final Object o) throws IOException {
if (!((o instanceof PropertiesList || (o instanceof KsqlEntityList)))) {
log.warn(
Expand Down
57 changes: 33 additions & 24 deletions ksql-cli/src/test/java/io/confluent/ksql/CliTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@
import io.confluent.ksql.version.metrics.VersionCheckerAgent;

import static io.confluent.ksql.TestResult.build;
import static io.confluent.ksql.testutils.AssertEventually.assertThatEventually;
import static io.confluent.ksql.util.KsqlConfig.KSQL_PERSISTENT_QUERY_NAME_PREFIX_CONFIG;
import static io.confluent.ksql.util.KsqlConfig.KSQL_PERSISTENT_QUERY_NAME_PREFIX_DEFAULT;
import static io.confluent.ksql.util.KsqlConfig.KSQL_SERVICE_ID_CONFIG;
Expand Down Expand Up @@ -265,14 +266,15 @@ private static void selectWithLimit(String selectQuery, int limit, TestResult.Or

@Test
public void testPrint() throws InterruptedException {

Thread wait = new Thread(() -> run("print 'ORDER_TOPIC' FROM BEGINNING INTERVAL 2;", false));
wait.start();
Thread.sleep(1000);
wait.interrupt();

String terminalOutput = terminal.getOutputString();
assertThat(terminalOutput, containsString("Format:JSON"));
final Thread thread =
new Thread(() -> run("print 'ORDER_TOPIC' FROM BEGINNING INTERVAL 2;", false));
thread.start();

try {
assertThatEventually(() -> terminal.getOutputString(), containsString("Format:JSON"));
} finally {
thread.interrupt();
}
}

@Test
Expand Down Expand Up @@ -564,27 +566,34 @@ public void shouldDescribeScalarFunction() throws Exception {

@Test
public void shouldDescribeOverloadedScalarFunction() throws Exception {
final String expectedSummary =
// Given:
localCli.handleLine("describe function substring;");

// Then:
final String output = terminal.getOutputString();

// Summary output:
assertThat(output, containsString(
"Name : SUBSTRING\n"
+ "Author : Confluent\n"
+ "Overview : returns a substring of the passed in value\n"
+ "Type : scalar\n"
+ "Overview : Returns a substring of the passed in value.\n"
));
assertThat(output, containsString(
"Type : scalar\n"
+ "Jar : internal\n"
+ "Variations : \n";
+ "Variations :"
));

final String expectedVariant =
"\tVariation : SUBSTRING(value VARCHAR, startIndex INT, endIndex INT)\n"
// Variant output:
assertThat(output, containsString(
"\tVariation : SUBSTRING(str VARCHAR, pos INT)\n"
+ "\tReturns : VARCHAR\n"
+ "\tDescription : Returns a string that is a substring of this string. "
+ "The substring begins with the character at the specified startIndex and extends to the character at endIndex -1.\n"
+ "\tstartIndex : The zero-based start index, inclusive.\n"
+ "\tendIndex : The zero-based end index, exclusive.";

localCli.handleLine("describe function substring;");

final String output = terminal.getOutputString();
assertThat(output, containsString(expectedSummary));
assertThat(output, containsString(expectedVariant));
+ "\tDescription : Returns a substring of str that starts at pos and continues to the end"
));
assertThat(output, containsString(
"\tstr : The source string. If null, then function returns null.\n"
+ "\tpos : The base-one position the substring starts from."
));
}

@Test
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/**
/*
* Copyright 2017 Confluent Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
Expand All @@ -18,11 +18,14 @@

import com.google.common.collect.ImmutableList;

import io.confluent.ksql.rest.entity.ArgumentInfo;
import io.confluent.ksql.rest.entity.EntityQueryId;
import io.confluent.ksql.rest.entity.FunctionDescriptionList;
import io.confluent.ksql.rest.entity.FunctionInfo;
import io.confluent.ksql.rest.entity.FunctionType;
import io.confluent.ksql.rest.entity.RunningQuery;
import io.confluent.ksql.rest.entity.FieldInfo;
import io.confluent.ksql.rest.util.EntityUtil;
import org.apache.kafka.connect.data.Field;
import org.apache.kafka.connect.data.SchemaBuilder;
import org.junit.After;
import org.junit.Test;
Expand Down Expand Up @@ -68,8 +71,8 @@
@RunWith(Parameterized.class)
public class ConsoleTest {

private TestTerminal terminal;
private KsqlRestClient client;
private final TestTerminal terminal;
private final KsqlRestClient client;

@Parameterized.Parameters(name = "{0}")
public static Collection<OutputFormat> data() {
Expand Down Expand Up @@ -163,6 +166,77 @@ public void shouldPrintTopicDescribeExtended() throws IOException {
}
}

@Test
public void shouldPrintFunctionDescription() throws IOException {
final KsqlEntityList entityList = new KsqlEntityList(ImmutableList.of(
new FunctionDescriptionList(
"DESCRIBE FUNCTION foo;",
"FOO",
"Description that is very, very, very, very, very, very, very, very, very, "
+ "very, very, very, very, very, very, very, very, very, very, very, very long\n"
+ "and containing new lines\n"
+ "\tAND TABS\n"
+ "too!",
"Andy",
"v1.1.0",
"some.jar",
ImmutableList.of(new FunctionInfo(
ImmutableList.of(
new ArgumentInfo(
"arg1",
"INT",
"Another really, really, really, really, really, really, really,"
+ "really, really, really, really, really, really, really, really "
+ " really, really, really, really, really, really, really, long\n"
+ "description\n"
+ "\tContaining Tabs\n"
+ "and stuff"
)
),
"LONG",
"The function description, which too can be really, really, really, "
+ "really, really, really, really, really, really, really, really, really, "
+ "really, really, really, really, really, really, really, really, long\n"
+ "and contains\n\ttabs and stuff"
)), FunctionType.scalar)));

terminal.printKsqlEntityList(entityList);

final String output = terminal.getOutputString();
if (terminal.getOutputFormat() == OutputFormat.JSON) {
assertThat(output, containsString("\"name\" : \"FOO\""));
} else {
final String expected = ""
+ "Name : FOO\n"
+ "Author : Andy\n"
+ "Version : v1.1.0\n"
+ "Overview : Description that is very, very, very, very, very, very, very, very, very, very, very, \n"
+ " very, very, very, very, very, very, very, very, very, very long\n"
+ " and containing new lines\n"
+ " \tAND TABS\n"
+ " too!\n"
+ "Type : scalar\n"
+ "Jar : some.jar\n"
+ "Variations : \n"
+ "\n"
+ "\tVariation : FOO(arg1 INT)\n"
+ "\tReturns : LONG\n"
+ "\tDescription : The function description, which too can be really, really, really, really, really, \n"
+ " really, really, really, really, really, really, really, really, really, really, \n"
+ " really, really, really, really, really, long\n"
+ " and contains\n"
+ " \ttabs and stuff\n"
+ "\targ1 : Another really, really, really, really, really, really, really,really, really, \n"
+ " really, really, really, really, really, really really, really, really, really, \n"
+ " really, really, really, long\n"
+ " description\n"
+ " \tContaining Tabs\n"
+ " and stuff";

assertThat(output, containsString(expected));
}
}

private List<FieldInfo> buildTestSchema(int size) {
SchemaBuilder dataSourceBuilder = SchemaBuilder.struct().name("TestSchema");
for (int i = 0; i < size; i++) {
Expand Down
Loading

0 comments on commit 5455868

Please sign in to comment.