Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integ match bool prefix #187 #634

Merged
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
a422774
Support MATCH_BOOL_PREFIX in SQL parser.
May 17, 2022
838da0f
Add an interface for PPL and SQL parsers.
May 17, 2022
f57429c
Implement Parser interface on PPLSyntaxParser.
May 18, 2022
e80a334
Add base class for all SQL tests.
May 18, 2022
531f31c
Add match_bool_prefix to list of known functions.
May 18, 2022
1b85a9e
Add match_bool_prefix to DSL.
May 18, 2022
dd4f77b
Add MatchBoolPrefixQuery query builder.
May 18, 2022
1ba94d9
Add MatchBoolPrefixQuery to FilterQueryBuilder.
May 18, 2022
1855e39
Address PR comments.
May 19, 2022
4321fc1
100% unit test coverage for RelevanceQuery class.
May 19, 2022
92087d9
Add integration test for match_bool_prefix in SQL.
May 19, 2022
56ead1b
Enable MatchBoolPrefixParserTest tests since they now pass.
May 20, 2022
a149dd7
Add user documentation for match_bool_prefix.
May 27, 2022
5ccadd5
Merge from upstream/main and resolve conflicts.
May 30, 2022
028281b
Merge branch 'opensearch-project-main'
May 30, 2022
2f664ac
Adding parameter options parser tests
forestmvey May 26, 2022
9f565b3
Add match_bool_prefix optional parameters to relevance function
forestmvey May 27, 2022
46b1116
Adding optional parameter integration tests
forestmvey May 27, 2022
0ba4e35
Updating match_bool_prefix documentation
forestmvey May 27, 2022
5798b06
Adding match_bool_prefix unit tests
forestmvey May 30, 2022
dfad749
Fixing documentation test
forestmvey May 30, 2022
70f6a21
Adding missing newline
forestmvey May 31, 2022
ef085e0
Merge from upstream/main and resolve conflicts. Part 2.
May 31, 2022
500c554
Fixing compilation errors, checkstyle errors, and code coverage
forestmvey Jun 1, 2022
9c102d6
Merge branch 'dev-match_bool_prefix-#187' into dev-match_bool_prefix-…
forestmvey Jun 1, 2022
7315f49
Aggregated additional parameters into one test and renamed test function
forestmvey Jun 2, 2022
3743bbd
Reverting unnecessary file formatting
forestmvey Jun 2, 2022
3f6ac10
Merge pull request #64 from Bit-Quill/dev-match_bool_prefix-2#187
forestmvey Jun 2, 2022
f792e95
Adding PPL syntax, documentation, integration tests, and PPL tests
forestmvey Jun 3, 2022
b625276
Make PPL integration test queries more appropriately match relevance …
forestmvey Jun 6, 2022
9b0eeb9
Merge pull request #67 from Bit-Quill/dev-match_bool_prefix-ppl#187
forestmvey Jun 6, 2022
0bcc123
Merge pull request #69 from Bit-Quill/dev-match_bool_prefix-#187
forestmvey Jun 6, 2022
37992ac
Add analyzer and operator parameters to match_bool_prefix in sql and ppl
Jun 16, 2022
111a8e6
Update SQL and PPL documentation.
Jun 17, 2022
2d2ddf8
Merge pull request #72 from Bit-Quill/dev-match_bool_prefix-#187-more…
MaxKsyunz Jun 17, 2022
0a08ce4
Merge remote-tracking branch 'upstream/main' into integ-match_bool_pr…
Jun 18, 2022
eefa7e6
Adding newline to functions.rst to fix doctest.
Jun 18, 2022
6a8fa69
Update test to use PPLSyntaxParser.parse instead of analyzeSyntax.
Jun 18, 2022
76b9158
Merge remote-tracking branch 'upstream/main' into integ-match_bool_pr…
Jun 23, 2022
686e9f2
Update analyzeSyntax to parse in PPL parser unit tests.
Jun 23, 2022
df48828
Move limitations section to the end of the document.
Jun 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
package org.opensearch.sql.common.antlr;

import org.antlr.v4.runtime.tree.ParseTree;

public interface Parser {
ParseTree parse(String query);
}
4 changes: 4 additions & 0 deletions core/src/main/java/org/opensearch/sql/expression/DSL.java
Original file line number Diff line number Diff line change
Expand Up @@ -662,6 +662,10 @@ public FunctionExpression simple_query_string(Expression... args) {
return compile(BuiltinFunctionName.SIMPLE_QUERY_STRING, args);
}

public FunctionExpression match_bool_prefix(Expression... args) {
return compile(BuiltinFunctionName.MATCH_BOOL_PREFIX, args);
}

private FunctionExpression compile(BuiltinFunctionName bfn, Expression... args) {
return (FunctionExpression) repository.compile(bfn.getName(), Arrays.asList(args.clone()));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,7 @@ public enum BuiltinFunctionName {
SIMPLE_QUERY_STRING(FunctionName.of("simple_query_string")),
MATCH_PHRASE(FunctionName.of("match_phrase")),
MATCHPHRASE(FunctionName.of("matchphrase")),
MATCH_BOOL_PREFIX(FunctionName.of("match_bool_prefix")),

/**
* Legacy Relevance Function.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import static org.opensearch.sql.data.type.ExprCoreType.STRING;
import static org.opensearch.sql.data.type.ExprCoreType.STRUCT;

import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import java.util.ArrayList;
import java.util.Collections;
Expand All @@ -27,6 +28,7 @@
public class OpenSearchFunctions {

public static final int MATCH_MAX_NUM_PARAMETERS = 14;
public static final int MATCH_BOOL_PREFIX_MAX_NUM_PARAMETERS = 9;
public static final int MATCH_PHRASE_MAX_NUM_PARAMETERS = 5;
public static final int MIN_NUM_PARAMETERS = 2;
public static final int SIMPLE_QUERY_STRING_MAX_NUM_PARAMETERS = 14;
Expand All @@ -35,6 +37,7 @@ public class OpenSearchFunctions {
* Add functions specific to OpenSearch to repository.
*/
public void register(BuiltinFunctionRepository repository) {
repository.register(match_bool_prefix());
repository.register(match());
repository.register(simple_query_string());
// Register MATCHPHRASE as MATCH_PHRASE as well for backwards
Expand All @@ -43,6 +46,11 @@ public void register(BuiltinFunctionRepository repository) {
repository.register(match_phrase(BuiltinFunctionName.MATCHPHRASE));
}

private static FunctionResolver match_bool_prefix() {
FunctionName name = BuiltinFunctionName.MATCH_BOOL_PREFIX.getName();
return getRelevanceFunctionResolver(name, MATCH_BOOL_PREFIX_MAX_NUM_PARAMETERS, STRING);
}

private static FunctionResolver match() {
FunctionName funcName = BuiltinFunctionName.MATCH.getName();
return getRelevanceFunctionResolver(funcName, MATCH_MAX_NUM_PARAMETERS, STRING);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,17 @@ public void named_non_parse_expression() {
assertAnalyzeEqual(DSL.ref("string_field", STRING), qualifiedName("string_field"));
}

@Test
void match_bool_prefix_expression() {
assertAnalyzeEqual(
dsl.match_bool_prefix(
dsl.namedArgument("field", DSL.literal("fieldA")),
dsl.namedArgument("query", DSL.literal("sample query"))),
AstDSL.function("match_bool_prefix",
AstDSL.unresolvedArg("field", stringLiteral("fieldA")),
AstDSL.unresolvedArg("query", stringLiteral("sample query"))));
}

@Test
void visit_span() {
assertAnalyzeEqual(
Expand Down
41 changes: 41 additions & 0 deletions docs/user/dql/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2233,6 +2233,47 @@ Another example to show how to set custom values for the optional parameters::
+----------------------+--------------------------+


MATCH_BOOL_PREFIX
-----

Description
>>>>>>>>>>>

``match_bool_prefix(field_expression, query_expression)``

The match_bool_prefix function maps to the match_bool_prefix query in the search engine. match_bool_prefix creates a match query from all but the last term in the query string. The last term is used to create a prefix query.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the meaning of these params documented in OpenSearch doc or not? I don't see it here: https://opensearch.org/docs/latest/opensearch/query-dsl/full-text/#match-boolean-prefix. I'm thinking if it's missing, probably we can add it in our doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the meaning of these params documented in OpenSearch doc or not? I don't see it here: https://opensearch.org/docs/latest/opensearch/query-dsl/full-text/#match-boolean-prefix. I'm thinking if it's missing, probably we can add it in our doc.

@dai-chen
Optional arguments are available in the options section of that page.

https://opensearch.org/docs/latest/opensearch/query-dsl/full-text/#options

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the meaning of these params documented in OpenSearch doc or not? I don't see it here: https://opensearch.org/docs/latest/opensearch/query-dsl/full-text/#match-boolean-prefix. I'm thinking if it's missing, probably we can add it in our doc.

@dai-chen Optional arguments are available in the options section of that page.

https://opensearch.org/docs/latest/opensearch/query-dsl/full-text/#options

Got it. I missed it. Thanks!


- fuzziness
- max_expansions
- prefix_length
- fuzzy_transpositions
- fuzzy_rewrite
- minimum_should_match
- boost
- operator
- analyzer

Example with only ``field`` and ``query`` expressions, and all other parameters are set default values::

os> SELECT firstname, address FROM accounts WHERE match_bool_prefix(address, 'Bristol Stre');
fetched rows / total rows = 2/2
+-------------+--------------------+
| firstname | address |
|-------------+--------------------|
| Hattie | 671 Bristol Street |
| Nanette | 789 Madison Street |
+-------------+--------------------+

Another example to show how to set custom values for the optional parameters::

os> SELECT firstname, address FROM accounts WHERE match_bool_prefix(address, 'Bristol Street', minimum_should_match=2);
fetched rows / total rows = 1/1
+-------------+--------------------+
| firstname | address |
|-------------+--------------------|
| Hattie | 671 Bristol Street |
+-------------+--------------------+

SIMPLE_QUERY_STRING
-------------------

Expand Down
44 changes: 43 additions & 1 deletion docs/user/ppl/functions/relevance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,4 +153,46 @@ Another example to show how to set custom values for the optional parameters::
Limitations
>>>>>>>>>>>

The relevance functions are available to execute only in OpenSearch DSL but not in memory as of now, so the relevance search might fail for queries that are too complex to translate into DSL if the relevance function is following after a complex PPL query. To make your queries always work-able, it is recommended to place the relevance commands as close to the search command as possible, to ensure the relevance functions are eligible to push down. For example, a complex query like ``search source = people | rename firstname as name | dedup account_number | fields name, account_number, balance, employer | where match(employer, 'Open Search') | stats count() by city`` could fail because it is difficult to translate to DSL, but it would be better if we rewrite it to an equivalent query as ``search source = people | where match(employer, 'Open Search') | rename firstname as name | dedup account_number | fields name, account_number, balance, employer | stats count() by city`` by moving the where command with relevance function to the second command right after the search command, and the relevance would be optimized and executed smoothly in OpenSearch DSL. See `Optimization <../../optimization/optimization.rst>`_ to get more details about the query engine optimization.
The relevance functions are available to execute only in OpenSearch DSL but not in memory as of now, so the relevance search might fail for queries that are too complex to translate into DSL if the relevance function is following after a complex PPL query. To make your queries always work-able, it is recommended to place the relevance commands as close to the search command as possible, to ensure the relevance functions are eligible to push down. For example, a complex query like ``search source = people | rename firstname as name | dedup account_number | fields name, account_number, balance, employer | where match(employer, 'Open Search') | stats count() by city`` could fail because it is difficult to translate to DSL, but it would be better if we rewrite it to an equivalent query as ``search source = people | where match(employer, 'Open Search') | rename firstname as name | dedup account_number | fields name, account_number, balance, employer | stats count() by city`` by moving the where command with relevance function to the second command right after the search command, and the relevance would be optimized and executed smoothly in OpenSearch DSL. See `Optimization <../../optimization/optimization.rst>`_ to get more details about the query engine optimization.


MATCH_BOOL_PREFIX
-----

Description
>>>>>>>>>>>

``match_bool_prefix(field_expression, query_expression)``

The match_bool_prefix function maps to the match_bool_prefix query in the search engine. match_bool_prefix creates a match query from all but the last term in the query string. The last term is used to create a prefix query.

- analyzer
- fuzziness
- max_expansions
- prefix_length
- fuzzy_transpositions
- operator
- fuzzy_rewrite
- minimum_should_match
- boost

Example with only ``field`` and ``query`` expressions, and all other parameters are set default values::

os> source=accounts | where match_bool_prefix(address, 'Bristol Stre') | fields firstname, address
fetched rows / total rows = 2/2
+-------------+--------------------+
| firstname | address |
|-------------+--------------------|
| Hattie | 671 Bristol Street |
| Nanette | 789 Madison Street |
+-------------+--------------------+

Another example to show how to set custom values for the optional parameters::

os> source=accounts | where match_bool_prefix(address, 'Bristol Stre', minimum_should_match = 2) | fields firstname, address
fetched rows / total rows = 1/1
+-------------+--------------------+
| firstname | address |
|-------------+--------------------|
| Hattie | 671 Bristol Street |
+-------------+--------------------+
Original file line number Diff line number Diff line change
Expand Up @@ -480,7 +480,7 @@ public void castStatementInWhereClauseLessThanConstantTest() {
public void castStatementInWhereClauseDatetimeCastTest() {
JSONObject response = executeJdbcRequest("SELECT date_keyword FROM "
+ TestsConstants.TEST_INDEX_DATE
+ " WHERE (CAST(date_keyword AS DATETIME) = \'2014-08-19T07:09:13.434Z\')");
+ " WHERE (CAST(date_keyword AS DATETIME) = '2014-08-19T07:09:13.434Z')");

String schema_result = "{\"name\":\"date_keyword\",\"type\":\"keyword\"}";
assertEquals(response.getJSONArray("schema").get(0).toString(), schema_result);
Expand Down Expand Up @@ -704,7 +704,7 @@ public void ifFuncShouldPassJDBC() {
JSONObject response = executeJdbcRequest(
"SELECT IF(age > 30, 'True', 'False') AS Ages FROM " + TEST_INDEX_ACCOUNT
+ " WHERE age IS NOT NULL GROUP BY Ages");
assertEquals("IF(age > 30, \'True\', \'False\')", response.query("/schema/0/name"));
assertEquals("IF(age > 30, 'True', 'False')", response.query("/schema/0/name"));
assertEquals("Ages", response.query("/schema/0/alias"));
assertEquals("keyword", response.query("/schema/0/type"));
}
Expand Down Expand Up @@ -742,7 +742,7 @@ public void ifnullShouldPassJDBC() throws IOException {
JSONObject response = executeJdbcRequest(
"SELECT IFNULL(lastname, 'unknown') AS name FROM " + TEST_INDEX_ACCOUNT
+ " GROUP BY name");
assertEquals("IFNULL(lastname, \'unknown\')", response.query("/schema/0/name"));
assertEquals("IFNULL(lastname, 'unknown')", response.query("/schema/0/name"));
assertEquals("name", response.query("/schema/0/alias"));
assertEquals("keyword", response.query("/schema/0/type"));
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.ppl;

import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_PHRASE;
import static org.opensearch.sql.util.MatcherUtils.rows;
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;

import java.io.IOException;
import org.json.JSONObject;
import org.junit.Test;

public class MatchBoolPrefixIT extends PPLIntegTestCase {

@Override
public void init() throws IOException {
loadIndex(Index.PHRASE);
}

@Test
public void valid_query_match_test() throws IOException {
JSONObject result =
executeQuery(
String.format(
"source=%s | where match_bool_prefix(phrase, 'qui') | fields phrase",
TEST_INDEX_PHRASE));

verifyDataRows(result,
rows("quick fox"),
rows("quick fox here"));
}

@Test
public void optional_parameter_match_test() throws IOException {
JSONObject result =
executeQuery(
String.format(
"source=%s | where match_bool_prefix(phrase, '2 tes', minimum_should_match=1, fuzziness=2) | fields phrase",
TEST_INDEX_PHRASE));

verifyDataRows(result,
rows("my test"),
rows("my test 2"));
}

@Test
public void no_matches_test() throws IOException {
JSONObject result =
executeQuery(
String.format(
"source=%s | where match_bool_prefix(phrase, 'rice') | fields phrase",
TEST_INDEX_PHRASE));

assertEquals(0, result.getInt("total"));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ public void ifnullShouldPassJDBC() throws IOException {
JSONObject response = executeJdbcRequest(
"SELECT IFNULL(lastname, 'unknown') AS name FROM " + TEST_INDEX_ACCOUNT
+ " GROUP BY name");
assertEquals("IFNULL(lastname, \'unknown\')", response.query("/schema/0/name"));
assertEquals("IFNULL(lastname, 'unknown')", response.query("/schema/0/name"));
assertEquals("name", response.query("/schema/0/alias"));
assertEquals("keyword", response.query("/schema/0/type"));
}
Expand Down Expand Up @@ -92,7 +92,7 @@ public void ifnullWithMissingInputTest() {
public void nullifShouldPassJDBC() throws IOException {
JSONObject response = executeJdbcRequest(
"SELECT NULLIF(lastname, 'unknown') AS name FROM " + TEST_INDEX_ACCOUNT);
assertEquals("NULLIF(lastname, \'unknown\')", response.query("/schema/0/name"));
assertEquals("NULLIF(lastname, 'unknown')", response.query("/schema/0/name"));
assertEquals("name", response.query("/schema/0/alias"));
assertEquals("keyword", response.query("/schema/0/type"));
}
Expand Down Expand Up @@ -181,8 +181,8 @@ public void isnullWithMathExpr() throws IOException{
@Test
public void ifShouldPassJDBC() throws IOException {
JSONObject response = executeJdbcRequest(
"SELECT IF(2 > 0, \'hello\', \'world\') AS name FROM " + TEST_INDEX_ACCOUNT);
assertEquals("IF(2 > 0, \'hello\', \'world\')", response.query("/schema/0/name"));
"SELECT IF(2 > 0, 'hello', 'world') AS name FROM " + TEST_INDEX_ACCOUNT);
assertEquals("IF(2 > 0, 'hello', 'world')", response.query("/schema/0/name"));
assertEquals("name", response.query("/schema/0/alias"));
assertEquals("keyword", response.query("/schema/0/type"));
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.sql;

import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_PHRASE;
import static org.opensearch.sql.util.MatcherUtils.rows;
import static org.opensearch.sql.util.MatcherUtils.schema;
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;
import static org.opensearch.sql.util.MatcherUtils.verifySchema;

import java.io.IOException;
import org.json.JSONObject;
import org.junit.Test;
import org.opensearch.sql.legacy.SQLIntegTestCase;

public class MatchBoolPrefixIT extends SQLIntegTestCase {
public void init() throws IOException {
loadIndex(SQLIntegTestCase.Index.PHRASE);
}

@Test
public void query_matches_test() throws IOException {
String query = "SELECT phrase FROM "
+ TEST_INDEX_PHRASE + " WHERE match_bool_prefix(phrase, 'quick')";
var result = new JSONObject(executeQuery(query, "jdbc"));
verifySchema(result, schema("phrase", "text"));

verifyDataRows(result,
rows("quick fox"),
rows("quick fox here"));
}

@Test
public void additional_parameters_test() throws IOException {
String query = "SELECT phrase FROM "
+ TEST_INDEX_PHRASE + " WHERE match_bool_prefix(phrase, '2 test', minimum_should_match=1, fuzziness=2)";
var result = new JSONObject(executeQuery(query, "jdbc"));
verifySchema(result, schema("phrase", "text"));

verifyDataRows(result,
rows("my test"),
rows("my test 2"));
}

@Test
public void no_matches_test() throws IOException {
String query = "SELECT * FROM "
+ TEST_INDEX_PHRASE + " WHERE match_bool_prefix(phrase, 'rice')";
var result = new JSONObject(executeQuery(query, "jdbc"));
assertEquals(0, result.getInt("total"));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -594,4 +594,4 @@ public void fieldWithSpacesInNameShouldPass() {
Assert.assertSame(TEXT, type.get());
}

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
public class EnvironmentTest {

/** Use context class for push/pop */
private SemanticContext context = new SemanticContext();
private final SemanticContext context = new SemanticContext();

@Test
public void defineFieldSymbolInDifferentEnvironmentsShouldBeAbleToResolve() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import org.opensearch.sql.opensearch.storage.script.filter.lucene.RangeQuery.Comparison;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.TermQuery;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.WildcardQuery;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.MatchBoolPrefixQuery;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.MatchPhraseQuery;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.MatchQuery;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.SimpleQueryStringQuery;
Expand Down Expand Up @@ -60,6 +61,7 @@ public class FilterQueryBuilder extends ExpressionNodeVisitor<QueryBuilder, Obje
.put(BuiltinFunctionName.MATCH_QUERY.getName(), new MatchQuery())
.put(BuiltinFunctionName.MATCHQUERY.getName(), new MatchQuery())
.put(BuiltinFunctionName.SIMPLE_QUERY_STRING.getName(), new SimpleQueryStringQuery())
.put(BuiltinFunctionName.MATCH_BOOL_PREFIX.getName(), new MatchBoolPrefixQuery())
.build();

/**
Expand Down
Loading