Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. #9893

Merged
merged 16 commits into from
Jun 3, 2020
3 changes: 2 additions & 1 deletion docs/misc/math-expr.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,8 @@ The following built-in functions are available.
|like|like(expr, pattern[, escape]) is equivalent to SQL `expr LIKE pattern`|
|lookup|lookup(expr, lookup-name) looks up expr in a registered [query-time lookup](../querying/lookups.md)|
|parse_long|parse_long(string[, radix]) parses a string as a long with the given radix, or 10 (decimal) if a radix is not provided.|
|regexp_extract|regexp_extract(expr, pattern[, index]) applies a regular expression pattern and extracts a capture group index, or null if there is no match. If index is unspecified or zero, returns the substring that matched the pattern.|
|regexp_extract|regexp_extract(expr, pattern[, index]) applies a regular expression pattern and extracts a capture group index, or null if there is no match. If index is unspecified or zero, returns the substring that matched the pattern. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern.|
|regexp_like|regexp_like(expr, pattern) returns whether `expr` matches regular expression `pattern`. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. |
|replace|replace(expr, pattern, replacement) replaces pattern with replacement|
|substring|substring(expr, index, length) behaves like java.lang.String's substring|
|right|right(expr, length) returns the rightmost length characters from a string|
Expand Down
9 changes: 5 additions & 4 deletions docs/querying/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -322,17 +322,18 @@ String functions accept strings, and return a type appropriate to the function.
|`LOWER(expr)`|Returns expr in all lowercase.|
|`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.|
|`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of needle within haystack, with indexes starting from 1. The search will begin at fromIndex, or 1 if fromIndex is not specified. If the needle is not found, returns 0.|
|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression pattern and extract a capture group, or null if there is no match. If index is unspecified or zero, returns the substring that matched the pattern.|
|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression `pattern` to `expr` and extract a capture group, or `NULL` if there is no match. If index is unspecified or zero, returns the first substring that matched the pattern. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Note: when `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both will return `NULL`).|
|`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular expression `pattern`. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Similar to [`LIKE`](#comparison-operators), but uses regexps instead of LIKE patterns. Especially useful in WHERE clauses.|
|`REPLACE(expr, pattern, replacement)`|Replaces pattern with replacement in expr, and returns the result.|
|`STRPOS(haystack, needle)`|Returns the index of needle within haystack, with indexes starting from 1. If the needle is not found, returns 0.|
|`SUBSTRING(expr, index, [length])`|Returns a substring of expr starting at index, with a max length, both measured in UTF-16 code units.|
|`RIGHT(expr, [length])`|Returns the rightmost length characters from expr.|
|`LEFT(expr, [length])`|Returns the leftmost length characters from expr.|
|`SUBSTR(expr, index, [length])`|Synonym for SUBSTRING.|
|<code>TRIM([BOTH &#124; LEADING &#124; TRAILING] [<chars> FROM] expr)</code>|Returns expr with characters removed from the leading, trailing, or both ends of "expr" if they are in "chars". If "chars" is not provided, it defaults to " " (a space). If the directional argument is not provided, it defaults to "BOTH".|
|`BTRIM(expr[, chars])`|Alternate form of `TRIM(BOTH <chars> FROM <expr>`).|
|`LTRIM(expr[, chars])`|Alternate form of `TRIM(LEADING <chars> FROM <expr>`).|
|`RTRIM(expr[, chars])`|Alternate form of `TRIM(TRAILING <chars> FROM <expr>`).|
|`BTRIM(expr[, chars])`|Alternate form of `TRIM(BOTH <chars> FROM <expr>)`.|
|`LTRIM(expr[, chars])`|Alternate form of `TRIM(LEADING <chars> FROM <expr>)`.|
|`RTRIM(expr[, chars])`|Alternate form of `TRIM(TRAILING <chars> FROM <expr>)`.|
|`UPPER(expr)`|Returns expr in all uppercase.|
|`REVERSE(expr)`|Reverses expr.|
|`REPEAT(expr, [N])`|Repeats expr N times|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ public class HllSketchEstimateWithErrorBoundsOperatorConversion extends DirectOp
.operatorBuilder(StringUtils.toUpperCase(FUNCTION_NAME))
.operandTypes(SqlTypeFamily.ANY, SqlTypeFamily.INTEGER)
.requiredOperands(1)
.returnType(SqlTypeName.OTHER)
.returnTypeNonNull(SqlTypeName.OTHER)
.build();


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ public class HllSketchToStringOperatorConversion extends DirectOperatorConversio
private static final SqlFunction SQL_FUNCTION = OperatorConversions
.operatorBuilder(StringUtils.toUpperCase(FUNCTION_NAME))
.operandTypes(SqlTypeFamily.ANY)
.returnType(SqlTypeName.VARCHAR)
.returnTypeNonNull(SqlTypeName.VARCHAR)
.build();

public HllSketchToStringOperatorConversion()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ public class DoublesSketchQuantileOperatorConversion extends DoublesSketchSingle
private static final SqlFunction SQL_FUNCTION = OperatorConversions
.operatorBuilder(StringUtils.toUpperCase(FUNCTION_NAME))
.operandTypes(SqlTypeFamily.ANY, SqlTypeFamily.NUMERIC)
.returnType(SqlTypeName.DOUBLE)
.returnTypeNonNull(SqlTypeName.DOUBLE)
.build();


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ public class DoublesSketchRankOperatorConversion extends DoublesSketchSingleArgB
private static final SqlFunction SQL_FUNCTION = OperatorConversions
.operatorBuilder(StringUtils.toUpperCase(FUNCTION_NAME))
.operandTypes(SqlTypeFamily.ANY, SqlTypeFamily.NUMERIC)
.returnType(SqlTypeName.DOUBLE)
.returnTypeNonNull(SqlTypeName.DOUBLE)
.build();

public DoublesSketchRankOperatorConversion()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ public class DoublesSketchSummaryOperatorConversion extends DirectOperatorConver
private static final SqlFunction SQL_FUNCTION = OperatorConversions
.operatorBuilder(StringUtils.toUpperCase(FUNCTION_NAME))
.operandTypes(SqlTypeFamily.ANY)
.returnType(SqlTypeName.VARCHAR)
.returnTypeNonNull(SqlTypeName.VARCHAR)
.build();

public DoublesSketchSummaryOperatorConversion()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ public class ThetaSketchEstimateWithErrorBoundsOperatorConversion extends Direct
private static final SqlFunction SQL_FUNCTION = OperatorConversions
.operatorBuilder(StringUtils.toUpperCase(FUNCTION_NAME))
.operandTypes(SqlTypeFamily.ANY, SqlTypeFamily.INTEGER)
.returnType(SqlTypeName.OTHER)
.returnTypeNonNull(SqlTypeName.OTHER)
.build();


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,4 +98,31 @@ static void checkLiteralArgument(String functionName, Expr arg, String argName)
{
Preconditions.checkArgument(arg.isLiteral(), createErrMsg(functionName, argName + " arg must be a literal"));
}

/**
* True if Expr is a string literal.
*
* In non-SQL-compliant null handling mode, this method will return true for null literals as well (because they are
* treated equivalently to empty strings, and we cannot tell the difference.)
*
* In SQL-compliant null handling mode, this method will return true for actual strings only, not nulls.
*/
static boolean isStringLiteral(final Expr expr)
{
return (expr.isLiteral() && expr.getLiteralValue() instanceof String)
|| (NullHandling.replaceWithDefault() && isNullLiteral(expr));
}

/**
* True if Expr is a null literal.
*
* In non-SQL-compliant null handling mode, this method will return true for either a null literal or an empty string
* literal (they are treated equivalently and we cannot tell the difference).
*
* In SQL-compliant null handling mode, this method will only return true for an actual null literal.
*/
static boolean isNullLiteral(final Expr expr)
{
return expr.isLiteral() && expr.getLiteralValue() == null;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -52,12 +52,18 @@ public Expr apply(final List<Expr> args)
final Expr patternExpr = args.get(1);
final Expr indexExpr = args.size() > 2 ? args.get(2) : null;

if (!patternExpr.isLiteral() || (indexExpr != null && !indexExpr.isLiteral())) {
throw new IAE("Function[%s] pattern and index must be literals", name());
if (!ExprUtils.isStringLiteral(patternExpr)) {
throw new IAE("Function[%s] pattern must be a string literal", name());
}

if (indexExpr != null && (!indexExpr.isLiteral() || !(indexExpr.getLiteralValue() instanceof Number))) {
throw new IAE("Function[%s] index must be a numeric literal", name());
}

// Precompile the pattern.
final Pattern pattern = Pattern.compile(String.valueOf(patternExpr.getLiteralValue()));
final Pattern pattern = Pattern.compile(
StringUtils.nullToEmptyNonDruidDataString((String) patternExpr.getLiteralValue())
);

final int index = indexExpr == null ? 0 : ((Number) indexExpr.getLiteralValue()).intValue();

Expand All @@ -72,10 +78,16 @@ private RegexpExtractExpr(Expr arg)
@Override
public ExprEval eval(final ObjectBinding bindings)
{
String s = arg.eval(bindings).asString();
final Matcher matcher = pattern.matcher(NullHandling.nullToEmptyIfNeeded(s));
final String retVal = matcher.find() ? matcher.group(index) : null;
return ExprEval.of(NullHandling.emptyToNullIfNeeded(retVal));
final String s = NullHandling.nullToEmptyIfNeeded(arg.eval(bindings).asString());

if (s == null) {
// True nulls do not match anything. Note: this branch only executes in SQL-compatible null handling mode.
return ExprEval.of(null);
} else {
final Matcher matcher = pattern.matcher(NullHandling.nullToEmptyIfNeeded(s));
final String retVal = matcher.find() ? matcher.group(index) : null;
return ExprEval.of(retVal);
}
}

@Override
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.apache.druid.query.expression;

import org.apache.druid.common.config.NullHandling;
import org.apache.druid.java.util.common.IAE;
import org.apache.druid.java.util.common.StringUtils;
import org.apache.druid.math.expr.Expr;
import org.apache.druid.math.expr.ExprEval;
import org.apache.druid.math.expr.ExprMacroTable;
import org.apache.druid.math.expr.ExprType;

import javax.annotation.Nonnull;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexpLikeExprMacro implements ExprMacroTable.ExprMacro
{
private static final String FN_NAME = "regexp_like";

@Override
public String name()
{
return FN_NAME;
}

@Override
public Expr apply(final List<Expr> args)
{
if (args.size() != 2) {
throw new IAE("Function[%s] must have 2 arguments", name());
}

final Expr arg = args.get(0);
final Expr patternExpr = args.get(1);

if (!ExprUtils.isStringLiteral(patternExpr)) {
throw new IAE("Function[%s] pattern must be a string literal", name());
}

// Precompile the pattern.
final Pattern pattern = Pattern.compile(
StringUtils.nullToEmptyNonDruidDataString((String) patternExpr.getLiteralValue())
);

class RegexpLikeExpr extends ExprMacroTable.BaseScalarUnivariateMacroFunctionExpr
{
private RegexpLikeExpr(Expr arg)
{
super(FN_NAME, arg);
}

@Nonnull
@Override
public ExprEval eval(final ObjectBinding bindings)
{
final String s = NullHandling.nullToEmptyIfNeeded(arg.eval(bindings).asString());

if (s == null) {
// True nulls do not match anything. Note: this branch only executes in SQL-compatible null handling mode.
return ExprEval.of(false, ExprType.LONG);
} else {
final Matcher matcher = pattern.matcher(NullHandling.nullToEmptyIfNeeded(s));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this nullToEmptyIfNeeded still needed because of the if block on line 77 - same comment for RegexpExtractMacro

Unclear to me if there's a performance loss from the extra function call (I'd think it's probably not measurable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I removed it.

return ExprEval.of(matcher.find(), ExprType.LONG);
}
}

@Override
public Expr visit(Shuttle shuttle)
{
Expr newArg = arg.visit(shuttle);
return shuttle.visit(new RegexpLikeExpr(newArg));
}

@Override
public String stringify()
{
return StringUtils.format("%s(%s, %s)", FN_NAME, arg.stringify(), patternExpr.stringify());
}
}
return new RegexpLikeExpr(arg);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
import org.apache.druid.math.expr.ExprEval;
import org.apache.druid.math.expr.ExprMacroTable;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;

import java.util.Arrays;
Expand All @@ -42,36 +41,33 @@ public class IPv4AddressMatchExprMacroTest extends MacroTestBase
private static final Expr SUBNET_10 = ExprEval.of("10.0.0.0/8").toExpr();
private static final Expr NOT_LITERAL = new NotLiteralExpr(null);

private IPv4AddressMatchExprMacro target;

@Before
public void setUp()
public IPv4AddressMatchExprMacroTest()
{
target = new IPv4AddressMatchExprMacro();
super(new IPv4AddressMatchExprMacro());
}

@Test
public void testTooFewArgs()
{
expectException(IllegalArgumentException.class, "must have 2 arguments");

target.apply(Collections.emptyList());
apply(Collections.emptyList());
}

@Test
public void testTooManyArgs()
{
expectException(IllegalArgumentException.class, "must have 2 arguments");

target.apply(Arrays.asList(IPV4, SUBNET_192_168, NOT_LITERAL));
apply(Arrays.asList(IPV4, SUBNET_192_168, NOT_LITERAL));
}

@Test
public void testSubnetArgNotLiteral()
{
expectException(IllegalArgumentException.class, "subnet arg must be a literal");

target.apply(Arrays.asList(IPV4, NOT_LITERAL));
apply(Arrays.asList(IPV4, NOT_LITERAL));
}

@Test
Expand All @@ -80,7 +76,7 @@ public void testSubnetArgInvalid()
expectException(IllegalArgumentException.class, "subnet arg has an invalid format");

Expr invalidSubnet = ExprEval.of("192.168.0.1/invalid").toExpr();
target.apply(Arrays.asList(IPV4, invalidSubnet));
apply(Arrays.asList(IPV4, invalidSubnet));
}

@Test
Expand Down Expand Up @@ -182,7 +178,7 @@ public void testInclusive()

private boolean eval(Expr... args)
{
Expr expr = target.apply(Arrays.asList(args));
Expr expr = apply(Arrays.asList(args));
ExprEval eval = expr.eval(ExprUtils.nilBindings());
return eval.asBoolean();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
import org.apache.druid.math.expr.Expr;
import org.apache.druid.math.expr.ExprEval;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;

import java.util.Arrays;
Expand All @@ -35,28 +34,25 @@ public class IPv4AddressParseExprMacroTest extends MacroTestBase
private static final long EXPECTED = 3232235521L;
private static final Long NULL = NullHandling.replaceWithDefault() ? NullHandling.ZERO_LONG : null;

private IPv4AddressParseExprMacro target;

@Before
public void setUp()
public IPv4AddressParseExprMacroTest()
{
target = new IPv4AddressParseExprMacro();
super(new IPv4AddressParseExprMacro());
}

@Test
public void testTooFewArgs()
{
expectException(IllegalArgumentException.class, "must have 1 argument");

target.apply(Collections.emptyList());
apply(Collections.emptyList());
}

@Test
public void testTooManyArgs()
{
expectException(IllegalArgumentException.class, "must have 1 argument");

target.apply(Arrays.asList(VALID, VALID));
apply(Arrays.asList(VALID, VALID));
}

@Test
Expand Down Expand Up @@ -154,7 +150,7 @@ public void testValidLongArg()

private Object eval(Expr arg)
{
Expr expr = target.apply(Collections.singletonList(arg));
Expr expr = apply(Collections.singletonList(arg));
ExprEval eval = expr.eval(ExprUtils.nilBindings());
return eval.value();
}
Expand Down
Loading