Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Nested Function Support In SELECT Clause #1490

Merged
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,14 @@ public LogicalPlan visitProject(Project node, AnalysisContext context) {
List<NamedExpression> namedExpressions =
selectExpressionAnalyzer.analyze(node.getProjectList(), context,
new ExpressionReferenceOptimizer(expressionAnalyzer.getRepository(), child));

for (UnresolvedExpression expr : node.getProjectList()) {
NestedAnalyzer nestedAnalyzer = new NestedAnalyzer(
namedExpressions, expressionAnalyzer, child
);
child = nestedAnalyzer.analyze(expr, context);
}
Comment on lines +370 to +375
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we generate a single LogicalNested here rather than get them merged later in MergeNestedAndNested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update to create the single LogicalNested here rather than merging further down query execution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm just thinking if we have all the project items and always merge multiple LogicalNested later, we may be able to save that optimizer rule.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the latest revision and wondering if the for loop is needed? Because we generate single nested operator with all namedExpressions right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the for loop is still needed. The namedExpressions are passed in each time for project push down, but we are fulfilling the LogicalNested nested fields by each nested Function in the projectList. Both namedExpressions and nested functions are required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using for-loop inside NesteAnalyzer, and change interface?

child = nestedAnalyzer.analyze(node.getProjectList(), context);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be my preference to keep the for-loop as is so we don't have to complicate adding additional arguments to an already formed LogicalPlan. If we move the for-loop into the NestedAnalyzer we will have to do some not-so-nice logic in NestedAnalyzer:Analyze to aggregate the arguments from the analyzed LogicalPlan's.
NestedAnalyzer.java#L36-L74
If you would prefer this implementation I can make the necessary revisions!


// new context
context.push();
TypeEnvironment newEnv = context.peek();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -416,17 +416,6 @@ private Expression visitIdentifier(String ident, AnalysisContext context) {
ReferenceExpression ref = DSL.ref(ident,
typeEnv.resolve(new Symbol(Namespace.FIELD_NAME, ident)));

// Fall back to old engine too if type is not supported semantically
if (isTypeNotSupported(ref.type())) {
throw new SyntaxCheckException(String.format(
"Identifier [%s] of type [%s] is not supported yet", ident, ref.type()));
}
return ref;
}

// Array type is not supporte yet.
private boolean isTypeNotSupported(ExprType type) {
return "array".equalsIgnoreCase(type.typeName());
}

}
111 changes: 111 additions & 0 deletions core/src/main/java/org/opensearch/sql/analysis/NestedAnalyzer.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.analysis;

import static org.opensearch.sql.data.type.ExprCoreType.STRING;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import lombok.RequiredArgsConstructor;
import org.opensearch.sql.ast.AbstractNodeVisitor;
import org.opensearch.sql.ast.expression.Alias;
import org.opensearch.sql.ast.expression.Function;
import org.opensearch.sql.ast.expression.QualifiedName;
import org.opensearch.sql.ast.expression.UnresolvedExpression;
import org.opensearch.sql.expression.NamedExpression;
import org.opensearch.sql.expression.ReferenceExpression;
import org.opensearch.sql.expression.function.BuiltinFunctionName;
import org.opensearch.sql.planner.logical.LogicalNested;
import org.opensearch.sql.planner.logical.LogicalPlan;

/**
* Analyze the Nested Function in the {@link AnalysisContext} to construct the {@link
* LogicalPlan}.
*/
@RequiredArgsConstructor
public class NestedAnalyzer extends AbstractNodeVisitor<LogicalPlan, AnalysisContext> {
private final List<NamedExpression> namedExpressions;
private final ExpressionAnalyzer expressionAnalyzer;
private final LogicalPlan child;

public LogicalPlan analyze(UnresolvedExpression projectItem, AnalysisContext context) {
LogicalPlan nested = projectItem.accept(this, context);
return (nested == null) ? child : nested;
}

@Override
public LogicalPlan visitAlias(Alias node, AnalysisContext context) {
return node.getDelegated().accept(this, context);
}

@Override
public LogicalPlan visitFunction(Function node, AnalysisContext context) {
if (node.getFuncName().equalsIgnoreCase(BuiltinFunctionName.NESTED.name())) {

List<UnresolvedExpression> expressions = node.getFuncArgs();
validateArgs(expressions);
ReferenceExpression nestedField =
(ReferenceExpression)expressionAnalyzer.analyze(expressions.get(0), context);
Map<String, ReferenceExpression> args;
if (expressions.size() == 2) {
args = Map.of(
"field", nestedField,
"path", (ReferenceExpression)expressionAnalyzer.analyze(expressions.get(1), context)
);
} else {
args = Map.of(
"field", (ReferenceExpression)expressionAnalyzer.analyze(expressions.get(0), context),
"path", generatePath(nestedField.toString())
);
}
if (child instanceof LogicalNested) {
((LogicalNested)child).addFields(args);
return child;
} else {
return new LogicalNested(child, new ArrayList<>(Arrays.asList(args)), namedExpressions);
}
}
return null;
}

/**
* Validate each parameter used in nested function in SELECT clause. Any supplied parameter
* for a nested function in a SELECT statement must be a valid qualified name, and the field
* parameter must be nested at least one level.
* @param args : Arguments in nested function.
*/
private void validateArgs(List<UnresolvedExpression> args) {
if (args.size() < 1 || args.size() > 2) {
throw new IllegalArgumentException(
"on nested object only allowed 2 parameters (field,path) or 1 parameter (field)"
);
}

for (int i = 0; i < args.size(); i++) {
if (!(args.get(i) instanceof QualifiedName)) {
throw new IllegalArgumentException(
String.format("Illegal nested field name: %s", args.get(i).toString())
);
}
if (i == 0 && ((QualifiedName)args.get(i)).getParts().size() < 2) {
throw new IllegalArgumentException(
String.format("Illegal nested field name: %s", args.get(i).toString())
);
}
}
}

/**
* Generate nested path dynamically. Assumes at least one level of nesting in supplied string.
* @param field : Nested field to generate path of.
* @return : Path of field derived from last level of nesting.
*/
private ReferenceExpression generatePath(String field) {
return new ReferenceExpression(field.substring(0, field.lastIndexOf(".")), STRING);
}
}
7 changes: 7 additions & 0 deletions core/src/main/java/org/opensearch/sql/executor/Explain.java
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import org.opensearch.sql.planner.physical.EvalOperator;
import org.opensearch.sql.planner.physical.FilterOperator;
import org.opensearch.sql.planner.physical.LimitOperator;
import org.opensearch.sql.planner.physical.NestedOperator;
import org.opensearch.sql.planner.physical.PhysicalPlan;
import org.opensearch.sql.planner.physical.PhysicalPlanNodeVisitor;
import org.opensearch.sql.planner.physical.ProjectOperator;
Expand Down Expand Up @@ -142,6 +143,12 @@ public ExplainResponseNode visitLimit(LimitOperator node, Object context) {
"limit", node.getLimit(), "offset", node.getOffset())));
}

@Override
public ExplainResponseNode visitNested(NestedOperator node, Object context) {
return explain(node, context, explanNode -> explanNode.setDescription(ImmutableMap.of(
"nested", node.getFields())));
}

protected ExplainResponseNode explain(PhysicalPlan node, Object context,
Consumer<ExplainResponseNode> doExplain) {
ExplainResponseNode explainNode = new ExplainResponseNode(getOperatorName(node));
Expand Down
4 changes: 4 additions & 0 deletions core/src/main/java/org/opensearch/sql/expression/DSL.java
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,10 @@ public static FunctionExpression xor(Expression... expressions) {
return compile(FunctionProperties.None, BuiltinFunctionName.XOR, expressions);
}

public static FunctionExpression nested(Expression... expressions) {
return compile(FunctionProperties.None, BuiltinFunctionName.NESTED, expressions);
}

public static FunctionExpression not(Expression... expressions) {
return compile(FunctionProperties.None, BuiltinFunctionName.NOT, expressions);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import lombok.RequiredArgsConstructor;
import org.opensearch.sql.data.model.ExprTupleValue;
import org.opensearch.sql.data.model.ExprValue;
import org.opensearch.sql.data.type.ExprCoreType;
import org.opensearch.sql.data.type.ExprType;
import org.opensearch.sql.expression.env.Environment;

Expand Down Expand Up @@ -100,7 +101,12 @@ public ExprValue resolve(ExprTupleValue value) {
}

private ExprValue resolve(ExprValue value, List<String> paths) {
final ExprValue wholePathValue = value.keyValue(String.join(PATH_SEP, paths));
ExprValue wholePathValue = value.keyValue(String.join(PATH_SEP, paths));
// For array types only first index currently supported.
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved
if (value.type().equals(ExprCoreType.ARRAY)) {
wholePathValue = value.collectionValue().get(0).keyValue(paths.get(0));
}

if (!wholePathValue.isMissing() || paths.size() == 1) {
return wholePathValue;
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,8 @@ public enum BuiltinFunctionName {
STDDEV_POP(FunctionName.of("stddev_pop")),
// take top documents from aggregation bucket.
TAKE(FunctionName.of("take")),
// Not always an aggregation query
NESTED(FunctionName.of("nested")),

/**
* Text Functions.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import lombok.Getter;
import lombok.Setter;
import lombok.experimental.UtilityClass;
import org.apache.commons.lang3.tuple.Pair;
import org.opensearch.sql.data.model.ExprValue;
import org.opensearch.sql.data.type.ExprType;
import org.opensearch.sql.expression.Expression;
Expand Down Expand Up @@ -47,6 +48,8 @@ public void register(BuiltinFunctionRepository repository) {
repository.register(score(BuiltinFunctionName.SCORE));
repository.register(score(BuiltinFunctionName.SCOREQUERY));
repository.register(score(BuiltinFunctionName.SCORE_QUERY));
// Functions supported in SELECT clause
repository.register(nested());
}

private static FunctionResolver match_bool_prefix() {
Expand Down Expand Up @@ -93,6 +96,36 @@ private static FunctionResolver wildcard_query(BuiltinFunctionName wildcardQuery
return new RelevanceFunctionResolver(funcName);
}

private static FunctionResolver nested() {
return new FunctionResolver() {
@Override
public Pair<FunctionSignature, FunctionBuilder> resolve(
FunctionSignature unresolvedSignature) {
return Pair.of(unresolvedSignature,
(functionProperties, arguments) ->
new FunctionExpression(BuiltinFunctionName.NESTED.getName(), arguments) {
@Override
public ExprValue valueOf(Environment<Expression, ExprValue> valueEnv) {
return valueEnv.resolve(getArguments().get(0));
dai-chen marked this conversation as resolved.
Show resolved Hide resolved
}

@Override
public ExprType type() {
return getArguments().get(0).type();
}
});
}

@Override
public FunctionName getFunctionName() {
return BuiltinFunctionName.NESTED.getName();
}
};
}




private static FunctionResolver score(BuiltinFunctionName score) {
FunctionName funcName = score.getName();
return new RelevanceFunctionResolver(funcName);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import org.opensearch.sql.planner.logical.LogicalEval;
import org.opensearch.sql.planner.logical.LogicalFilter;
import org.opensearch.sql.planner.logical.LogicalLimit;
import org.opensearch.sql.planner.logical.LogicalNested;
import org.opensearch.sql.planner.logical.LogicalPlan;
import org.opensearch.sql.planner.logical.LogicalPlanNodeVisitor;
import org.opensearch.sql.planner.logical.LogicalProject;
Expand All @@ -26,6 +27,7 @@
import org.opensearch.sql.planner.physical.EvalOperator;
import org.opensearch.sql.planner.physical.FilterOperator;
import org.opensearch.sql.planner.physical.LimitOperator;
import org.opensearch.sql.planner.physical.NestedOperator;
import org.opensearch.sql.planner.physical.PhysicalPlan;
import org.opensearch.sql.planner.physical.ProjectOperator;
import org.opensearch.sql.planner.physical.RareTopNOperator;
Expand Down Expand Up @@ -94,6 +96,11 @@ public PhysicalPlan visitEval(LogicalEval node, C context) {
return new EvalOperator(visitChild(node, context), node.getExpressions());
}

@Override
public PhysicalPlan visitNested(LogicalNested node, C context) {
return new NestedOperator(visitChild(node, context), node.getFields());
}

@Override
public PhysicalPlan visitSort(LogicalSort node, C context) {
return new SortOperator(visitChild(node, context), node.getSortList());
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.planner.logical;

import java.util.Collections;
import java.util.List;
import java.util.Map;
import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.ToString;
import org.opensearch.sql.expression.NamedExpression;
import org.opensearch.sql.expression.ReferenceExpression;

/**
* Logical Nested plan.
*/
@EqualsAndHashCode(callSuper = true)
@Getter
@ToString
public class LogicalNested extends LogicalPlan {
forestmvey marked this conversation as resolved.
Show resolved Hide resolved
private List<Map<String, ReferenceExpression>> fields;
private final List<NamedExpression> projectList;

/**
* Constructor of LogicalNested.
*
*/
public LogicalNested(
LogicalPlan childPlan,
List<Map<String, ReferenceExpression>> fields,
List<NamedExpression> projectList
) {
super(Collections.singletonList(childPlan));
this.fields = fields;
this.projectList = projectList;
}

public void addFields(Map<String, ReferenceExpression> fields) {
this.fields.add(fields);
}

@Override
public <R, C> R accept(LogicalPlanNodeVisitor<R, C> visitor, C context) {
return visitor.visitNested(this, context);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,14 @@ public LogicalPlan highlight(LogicalPlan input, Expression field,
return new LogicalHighlight(input, field, arguments);
}


public static LogicalPlan nested(
LogicalPlan input,
List<Map<String, ReferenceExpression>> nestedArgs,
List<NamedExpression> projectList) {
return new LogicalNested(input, nestedArgs, projectList);
}

public static LogicalPlan remove(LogicalPlan input, ReferenceExpression... fields) {
return new LogicalRemove(input, ImmutableSet.copyOf(fields));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@ public R visitEval(LogicalEval plan, C context) {
return visitNode(plan, context);
}

public R visitNested(LogicalNested plan, C context) {
return visitNode(plan, context);
}

public R visitSort(LogicalSort plan, C context) {
return visitNode(plan, context);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ public static LogicalPlanOptimizer create() {
TableScanPushDown.PUSH_DOWN_SORT,
TableScanPushDown.PUSH_DOWN_LIMIT,
TableScanPushDown.PUSH_DOWN_HIGHLIGHT,
TableScanPushDown.PUSH_DOWN_NESTED,
TableScanPushDown.PUSH_DOWN_PROJECT,
new CreateTableWriteBuilder()));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import org.opensearch.sql.planner.logical.LogicalFilter;
import org.opensearch.sql.planner.logical.LogicalHighlight;
import org.opensearch.sql.planner.logical.LogicalLimit;
import org.opensearch.sql.planner.logical.LogicalNested;
import org.opensearch.sql.planner.logical.LogicalPlan;
import org.opensearch.sql.planner.logical.LogicalProject;
import org.opensearch.sql.planner.logical.LogicalRelation;
Expand Down Expand Up @@ -65,6 +66,13 @@ public static <T extends LogicalPlan> Pattern<LogicalHighlight> highlight(Patter
return Pattern.typeOf(LogicalHighlight.class).with(source(pattern));
}

/**
* Logical nested operator with a given pattern on inner field.
*/
public static <T extends LogicalPlan> Pattern<LogicalNested> nested(Pattern<T> pattern) {
return Pattern.typeOf(LogicalNested.class).with(source(pattern));
}

/**
* Logical project operator with a given pattern on inner field.
*/
Expand Down
Loading