-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation for simple_query_string
.
#61
Implementation for simple_query_string
.
#61
Conversation
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Required renaming analyzeSyntax to parse. Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Added base class that is commont to match_query and match_bool_prefix and others. Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss adding field parsing to the parser and propagating the parsed result down to SimpleQueryString.
I believe that will result clearer code.
Also, why is the SQL Java CI workflow failing?
@@ -308,6 +319,12 @@ relevanceArgName | |||
| BOOST | |||
; | |||
|
|||
relevanceField | |||
: relevanceArgValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too permissive and allows for odd queries like: SIMPLE_QUERY_STRING([true, 3.14], ...)
Field list is a list of weighted field name patterns -- ["field_A^3.4", "field*", fie*^3.45].
Wildcard by itself -- "*" -- is a valid field name as well.
as well as org/opensearch/index/search/SimpleQueryStringQueryParser.java in opensearch code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return new Function( | ||
ctx.relevanceFunctionName().getText().toLowerCase(), | ||
relevanceArguments(ctx)); | ||
if (ctx.relevanceFunctionType1() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a visitRelevanceFunctionType1
method that you can override and put the content of this if block.
ctx.relevanceFunctionType1().relevanceFunctionType1Name().getText().toLowerCase(), | ||
relevanceArgumentsType1(ctx.relevanceFunctionType1())); | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a visitRelevanceFunctionType2
method that you can override and put the content of this else block.
@@ -252,9 +254,16 @@ public UnresolvedExpression visitConvertedDataType(ConvertedDataTypeContext ctx) | |||
|
|||
@Override | |||
public UnresolvedExpression visitRelevanceExpression(RelevanceExpressionContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaced by two overrides -- see the following comments.
import java.util.function.BiFunction; | ||
import java.util.stream.Collectors; | ||
|
||
public class SimpleQueryString extends LuceneQuery { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be SimpleQueryStringQuery to follow name convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed in 2b4a99b.
continue; | ||
} | ||
|
||
var parts = elem.split("\\^"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the return value of split? The should be used instead of var.
Also, why "\\^"
instead of ^
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I dislike the extensive use of var in strongly-typed languages like Java - because I find it makes the code unreadable.
Anybody reading the code shouldn't have to look up the function to find out what the return type is supposed to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why
"\\^"
instead of^
?
split
function receives a regex
, so I have to escape ^
, otherwise it would be interpreted as a regex control symbol.
|
||
var parts = elem.split("\\^"); | ||
var weight = Float.parseFloat(parts[parts.length - 1]); | ||
var field = Arrays.stream(parts).limit(parts.length - 1).collect(Collectors.joining("^")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to look at making field weight a first-class parser concept. It's not clear if this implementation can differentiate between field named title^4.3
and weight of 4.3 for field title
.
Is the \\^
above to allow for escaping ^
that is part of field names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I proposed the following in the opensearch-project -- opensearch-project#182 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the
\\^
above to allow for escaping^
that is part of field names?
No. split
function receives a regex
, so I have to escape ^
, otherwise it would be interpreted as a regex control symbol.
A user can make a search like
simple_query_string(["some_field^1", "my^favourite^field^ever^100500"], ...)
and it would be parsed into following map
/-------------------------v--------\
| field | weight |
>-------------------------+--------<
| some_field | 1 |
| my^favourite^field^ever | 100500 |
\-------------------------^--------/
Signed-off-by: Yury Fridlyand <[email protected]>
@@ -158,6 +161,20 @@ public Expression visitAggregateFunction(AggregateFunction node, AnalysisContext | |||
} | |||
} | |||
|
|||
@Override | |||
public Expression visitLiteralList(LiteralList node, AnalysisContext context) { | |||
var lst = node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list
vs lst
? There's really no value to saving a single character in the code.
Code (java especially) should be verbose. list
isn't really verbose either, and you could call it literalListCollection
or similar.
Along those lines, I would also call node
above literalListNode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lst
holds a list of internal AST objects. The list and the objects are consumed on the next line of code and don't exist anymore, so I called it just lst
and not veryImportantListOfMagicThingsWhatever
.
Re function arguments: have a look on the rest functions in this file. All of them are
public Expression visitSomething(Something node, AnalysisContext context)
So newly added function fits this style. Should I modify all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to fix everything, but we should name our variables appropriately.
core/src/main/java/org/opensearch/sql/ast/expression/LiteralList.java
Outdated
Show resolved
Hide resolved
import org.opensearch.sql.expression.env.Environment; | ||
|
||
/** | ||
* Literal Expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literal List Expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in 84d84e1.
continue; | ||
} | ||
|
||
var parts = elem.split("\\^"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I dislike the extensive use of var in strongly-typed languages like Java - because I find it makes the code unreadable.
Anybody reading the code shouldn't have to look up the function to find out what the return type is supposed to be.
var res = new HashMap<String, Float>(); | ||
for (var literal : input) { | ||
var elem = literal.stringValue(); | ||
if (!elem.contains("^")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't consider the case where the field contains a "^" in the name. We should be looking to make sure that the element contains a "^" with a float afterwards (\d+.\d+).
This requires regular expressions or tokenizing the string.
private Map<String, Float> parseFields(List<ExprValue> input) { | ||
try { | ||
var res = new HashMap<String, Float>(); | ||
for (var literal : input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a .stream().map() here
} | ||
return res; | ||
} | ||
catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more specific what we're catching? This will catch everything that turn it into a SemanticCheckException without much explanation.
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
- Two new AST entities: `RelevanceField` and `RelevanceFieldList` (instead of `LiteralList`). - Update in `AstExpressionBuilder`x2. - Rework on `ExpressionAnalyzer` and `SimpleQueryStringQuery`. - Update in `PPL` ANTLR parser. Signed-off-by: Yury Fridlyand <[email protected]>
Removed `RelevanceField` as not needed anymore. Since parsing result is stored in `ExprTupleValue` which is actually a map, so we can keep data there. Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
|
||
@Override | ||
public List<UnresolvedExpression> getChild() { | ||
return List.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parent function returns null. What's the benefit of returning a list?
// Operators. Bit | ||
|
||
BIT_NOT_OP: '~'; | ||
//BIT_OR_OP: '|'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented code - can we remove?
Signed-off-by: MaxKsyunz <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
…ount check. Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Codecov Report
@@ Coverage Diff @@
## dev-simple_query_string-#192 #61 +/- ##
==================================================================
+ Coverage 97.66% 97.70% +0.04%
- Complexity 2743 2767 +24
==================================================================
Files 268 267 -1
Lines 6772 6891 +119
Branches 435 436 +1
==================================================================
+ Hits 6614 6733 +119
Misses 157 157
Partials 1 1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Signed-off-by: Yury Fridlyand <[email protected]>
Another example to show how to set custom values for the optional parameters:: | ||
|
||
os> select firstname, lastname, city, address from accounts where simple_query_string(['firstname', city ^ 2], 'Amber Nogal', analyzer=keyword, default_operator='AND'); | ||
fetched rows / total rows = 0/0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should consider adding some additional test data for these tests. Perhaps out of the scope of this ticket but it would be nice having clear statements that show usage of these tests by returning exemplary data. It seems hard to do this with the optional parameters and the small amount of queryable data we currently have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider using more complicated samples as I did in integration test. Prior to that we need to reduce the data set to have it short and simple. See b2639e3.
Signed-off-by: Yury Fridlyand <[email protected]>
…v-simple_query_string-#192-impl2 Signed-off-by: Yury Fridlyand <[email protected]>
…ava`. Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
.entrySet() | ||
.stream() | ||
.collect(Collectors.toMap( | ||
n -> (String) ((Literal) n.getKey()).getValue(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great if we could do this without the casts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to convert there Map<UnresolvedExpression, UnresolvedExpression>
into a Map<String, ExprValue>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First cast can be simplified to n.getKey().toString()
, but I don't see a way to simplify the second one.
return fieldList | ||
.entrySet() | ||
.stream() | ||
.map(e -> String.format("\"%s\" ^ %s", e.getKey().toString(), e.getValue().toString())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe toString
is unnecessary here -- String.format
takes care of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed in 71bf20c.
@@ -45,7 +45,7 @@ public List<ExprValue> collectionValue() { | |||
public String toString() { | |||
return valueList.stream() | |||
.map(Object::toString) | |||
.collect(Collectors.joining(",", "[", "]")); | |||
.collect(Collectors.joining(", ", "[", "]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change necessary to make a test pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it just makes the string representation a bit more readable. I was the first user of it.
@@ -59,7 +61,7 @@ public void contextShouldIncludeAllFieldsAfterVisitingIndexNameInFromClause() { | |||
Map<String, Type> typeByName = context.peek().resolveAll(Namespace.FIELD_NAME); | |||
assertThat( | |||
typeByName, | |||
allOf( | |||
Matchers.<Map<String, Type>>allOf( | |||
aMapWithSize(21), | |||
hasEntry("semantics", new OpenSearchIndex("semantics", INDEX)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is different in upstream/main
.
The (Type)
cast is removed by IntelliJ's Code Cleanup which then required addition of th
I don't know how you ran into this but that's what I experienced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed in 65ec516.
numParameters <= MIN_NUM_PARAMETERS + numOptionalParameters; | ||
numParameters++) { | ||
List<ExprType> args = new ArrayList<>(Collections.nCopies(numParameters, STRING)); | ||
List<ExprType> args = new ArrayList<>(Collections.nCopies(numParameters - 1, STRING)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is -1
necessary?
upstream/main
does not have -1
here nor on line 84 above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add one more parameter on the next line. It may be of another type, so I introduced additional function argument.
I agree that it is an ugly implementation.
@@ -116,7 +119,7 @@ public void defineFieldSymbolShouldBeAbleToResolveAll() { | |||
Map<String, Type> typeByName = environment().resolveAll(Namespace.FIELD_NAME); | |||
assertThat( | |||
typeByName, | |||
allOf( | |||
Matchers.<Map<String, Type>>allOf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as SemanticAnalyzerBasicTest.java
this file is different in upstream/main
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed in 65ec516.
Signed-off-by: Yury Fridlyand <[email protected]>
…ace. Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
|
||
private static FunctionResolver match_bool_prefix() { | ||
FunctionName name = BuiltinFunctionName.MATCH_BOOL_PREFIX.getName(); | ||
// TODO: Create different resolver more suited for relevance functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is caused by the merge. Fixed in cba5749, the merge was reverted.
build.gradle
Outdated
@@ -6,9 +6,9 @@ | |||
|
|||
buildscript { | |||
ext { | |||
opensearch_version = System.getProperty("opensearch.version", "2.0.0-rc1-SNAPSHOT") | |||
opensearch_version = System.getProperty("opensearch.version", "2.0.0-SNAPSHOT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this is in your PR - is this due to a downmerge from main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is caused by the merge. Fixed in cba5749, the merge was reverted.
// Test is disabled because DSL and AstDSL expressions always has fields in the different order | ||
// regardless of how they are specified | ||
// Test is disabled because DSL and AstDSL expressions sometimes has fields in the different order | ||
// regardless of how they are specified. Test passes and fails in a flaky behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate.
Can we raise a follow-up ticket to resolve this in a timely manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 048984c.
…utableMap` from Guava. Signed-off-by: Yury Fridlyand <[email protected]>
Signed-off-by: Yury Fridlyand <[email protected]>
@@ -6,9 +6,9 @@ | |||
|
|||
buildscript { | |||
ext { | |||
opensearch_version = System.getProperty("opensearch.version", "2.0.0-SNAPSHOT") | |||
opensearch_version = System.getProperty("opensearch.version", "2.0.0-rc1-SNAPSHOT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will revert changes that are in the upstream/main branch.
Let's be careful that we aren't undoing Joshua's work.
Reimplemented in #66. |
Signed-off-by: Yury Fridlyand [email protected]
Description
Rework on #53 and #55.
Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.