Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for SQL and PPL parser to accept simple_query_string #53

1 change: 1 addition & 0 deletions ppl/src/main/antlr/OpenSearchPPLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ IF: 'IF';

// RELEVANCE FUNCTIONS AND PARAMETERS
MATCH: 'MATCH';
SIMPLE_QUERY_STRING: 'SIMPLE_QUERY_STRING';
ANALYZER: 'ANALYZER';
FUZZINESS: 'FUZZINESS';
AUTO_GENERATE_SYNONYMS_PHRASE_QUERY:'AUTO_GENERATE_SYNONYMS_PHRASE_QUERY';
Expand Down
21 changes: 20 additions & 1 deletion ppl/src/main/antlr/OpenSearchPPLParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,10 @@ booleanExpression

relevanceExpression
: relevanceFunctionName LT_PRTHS
field=relevanceArgValue COMMA query=relevanceArgValue
field=relevanceField COMMA query=relevanceQuery
(COMMA relevanceArg)* RT_PRTHS
| relevanceFunctionNameEx LT_PRTHS
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename it? Like relevanceFunctionType1Name and relevanceFunctionType2Name.
Or maybe to make relevanceExpressionType1 and relevanceExpressionType2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be explicit with your naming as possible. Type1 and Type2 don't mean anything. If there's an order associated with the types in the list (and no other way to differentiate the types) we can call them first and second or something similar.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it to have separate rule for single-field relevance functions and a separate rule for multi-field relevance functions.

This way we can avoid the cumbersome ?: in AstExpressionBuilder. Instead there will be one method to visit single field relevance functions and another to visit multi-field relevance functions.

fields=relevanceFields COMMA query=relevanceQuery
(COMMA relevanceArg)* RT_PRTHS
;

Expand Down Expand Up @@ -308,6 +311,18 @@ relevanceArgName
| BOOST
;

relevanceField
: relevanceArgValue
;

relevanceFields
// simple_query_string function accepts list of field as well in format '["field1", "field2", ... ]'
: (LT_SQR_PRTHS relevanceArgValue (COMMA relevanceArgValue)* RT_SQR_PRTHS)
;

relevanceQuery
: relevanceArgValue;

relevanceArgValue
: qualifiedName
| literalValue
Expand Down Expand Up @@ -353,6 +368,10 @@ relevanceFunctionName
: MATCH
;

relevanceFunctionNameEx
: SIMPLE_QUERY_STRING
;

/** literals and values*/
literalValue
: intervalLiteral
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -252,8 +252,11 @@ public UnresolvedExpression visitConvertedDataType(ConvertedDataTypeContext ctx)

@Override
public UnresolvedExpression visitRelevanceExpression(RelevanceExpressionContext ctx) {
ParserRuleContext func = ctx.relevanceFunctionNameEx();
if (func == null)
func = ctx.relevanceFunctionName();
return new Function(
ctx.relevanceFunctionName().getText().toLowerCase(),
func.getText().toLowerCase(),
relevanceArguments(ctx));
}

Expand Down Expand Up @@ -332,8 +335,11 @@ private List<UnresolvedExpression> relevanceArguments(RelevanceExpressionContext
// all the arguments are defaulted to string values
// to skip environment resolving and function signature resolving
ImmutableList.Builder<UnresolvedExpression> builder = ImmutableList.builder();
ParserRuleContext field = ctx.field;
if (field == null)
field = ctx.fields;
builder.add(new UnresolvedArgument("field",
new Literal(StringUtils.unquoteText(ctx.field.getText()), DataType.STRING)));
new Literal(StringUtils.unquoteText(field.getText()), DataType.STRING)));
builder.add(new UnresolvedArgument("query",
new Literal(StringUtils.unquoteText(ctx.query.getText()), DataType.STRING)));
ctx.relevanceArg().forEach(v -> builder.add(new UnresolvedArgument(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we create 2 different functions as relevanceArguments?
  2. Should we skip StringUtils.unquoteText for fields?

Expand Down
3 changes: 3 additions & 0 deletions sql/src/main/antlr/OpenSearchSQLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,7 @@ INCLUDE: 'INCLUDE';
IN_TERMS: 'IN_TERMS';
MATCHPHRASE: 'MATCHPHRASE';
MATCH_PHRASE: 'MATCH_PHRASE';
SIMPLE_QUERY_STRING: 'SIMPLE_QUERY_STRING';
MATCHQUERY: 'MATCHQUERY';
MATCH_QUERY: 'MATCH_QUERY';
MINUTE_OF_DAY: 'MINUTE_OF_DAY';
Expand Down Expand Up @@ -357,6 +358,8 @@ BIT_XOR_OP: '^';
DOT: '.';
LR_BRACKET: '(';
RR_BRACKET: ')';
LT_SQR_PRTHS: '[';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's this name coming from?
Just wondering why not something like: LR_SQUARE_BRACKET and RR_SQUARE_BRACKET

RT_SQR_PRTHS: ']';
COMMA: ',';
SEMI: ';';
AT_SIGN: '@';
Expand Down
20 changes: 19 additions & 1 deletion sql/src/main/antlr/OpenSearchSQLParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,10 @@ specificFunction

relevanceFunction
: relevanceFunctionName LR_BRACKET
field=relevanceArgValue COMMA query=relevanceArgValue
field=relevanceField COMMA query=relevanceQuery
(COMMA relevanceArg)* RR_BRACKET
| relevanceFunctionNameEx LR_BRACKET
fields=relevanceFields COMMA query=relevanceQuery
(COMMA relevanceArg)* RR_BRACKET
;

Expand Down Expand Up @@ -386,6 +389,10 @@ relevanceFunctionName
: MATCH
;

relevanceFunctionNameEx
: SIMPLE_QUERY_STRING
;

legacyRelevanceFunctionName
: QUERY | MATCH_QUERY | MATCHQUERY
;
Expand All @@ -408,6 +415,17 @@ relevanceArgName
| BOOST
;

relevanceField
: relevanceArgValue;

relevanceFields
// simple_query_string function accepts list of field as well in format '["field1", "field2", ... ]'
: (LT_SQR_PRTHS relevanceArgValue (COMMA relevanceArgValue)* RT_SQR_PRTHS)
;

relevanceQuery
: relevanceArgValue;

relevanceArgValue
: qualifiedName
| constant
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;

import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.RuleContext;
import org.apache.commons.lang3.tuple.ImmutablePair;
import org.apache.commons.lang3.tuple.Pair;
Expand Down Expand Up @@ -362,8 +364,11 @@ public UnresolvedExpression visitConvertedDataType(

@Override
public UnresolvedExpression visitRelevanceFunction(RelevanceFunctionContext ctx) {
ParserRuleContext func = ctx.relevanceFunctionNameEx();
Copy link

@acarbonetto acarbonetto May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParserRuleContext func = ctx.relevanceFunctionNameEx() != null ctx.relevanceFunctionNameEx() : ctx.relevanceFunctionName();

if (func == null)
func = ctx.relevanceFunctionName();
return new Function(
ctx.relevanceFunctionName().getText().toLowerCase(),
func.getText().toLowerCase(),
relevanceArguments(ctx));
}

Expand Down Expand Up @@ -393,8 +398,11 @@ private List<UnresolvedExpression> relevanceArguments(RelevanceFunctionContext c
// all the arguments are defaulted to string values
// to skip environment resolving and function signature resolving
ImmutableList.Builder<UnresolvedExpression> builder = ImmutableList.builder();
ParserRuleContext field = ctx.field;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParserRuleContext field = ctx.field != null ctx.field : ctx.field2;

if (field == null)
field = ctx.fields;
builder.add(new UnresolvedArgument("field",
new Literal(StringUtils.unquoteText(ctx.field.getText()), DataType.STRING)));
new Literal(StringUtils.unquoteText(field.getText()), DataType.STRING)));
builder.add(new UnresolvedArgument("query",
new Literal(StringUtils.unquoteText(ctx.query.getText()), DataType.STRING)));
ctx.relevanceArg().forEach(v -> builder.add(new UnresolvedArgument(
Expand Down