-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve processing OpenSearch data types. Fix using subfields for text
type.
#299
Conversation
This comment was marked as spam.
This comment was marked as spam.
core/src/main/java/org/opensearch/sql/expression/operator/convert/TypeCastOperator.java
Outdated
Show resolved
Hide resolved
} | ||
return fieldName; | ||
// Pick first field. What to do if there are multiple fields? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pass in the type like how it was with convertTextToKeyword and map that type by finding it in the list of fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed?
Now this function isn't static and different types may overload it if needed. Having that we can avoid creating a new function like convertXXXtoYYY
in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But how would it know which type to convert to? For example, doing aggregation on text with mapping
"textColumn": {
"type": "text",
"fields": {
"date": {
"type": "date"
},
"keyword": {
"type": "keyword"
}
}
}
}
will do aggregation on textColumn.date
. What would be expected here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @GumpacG on this.
keyword
field is a convention in OpenSearch to mean "first bit of the text" and conversion is "ok, I guess" for legacy's sake but in general picking the first field would lead to unexpected results that depend on the mapping.
On the other hand, if fielddata
is set then it is safe to use textColumn
field in this place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to aggregate on dates inside a text field.
I changed to find a string subfield if present in 8b0671c.
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
554a460
to
e885a44
Compare
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For docs/dev/img/type-hierarchy-tree-final.png:
- Can you split STRING into TEXT and KEYWORD?
- Can you align DATE and TIME
- I'm not sure you want STRING --> DATE/TIME/DATETIME/TIMESTAMP since its a very specific set of strings that convert. I think that conversion is 'special' and doesn't need to be defined here.
core/src/main/java/org/opensearch/sql/expression/operator/convert/TypeCastOperator.java
Outdated
Show resolved
Hide resolved
|
||
## Final type hierarchy scheme | ||
|
||
![Most relevant type hierarchy](img/type-hierarchy-tree-final.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's only STRING in that listing. Should we specify TEXT vs KEYWORD there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no TEXT
nor KEYWORD
in ExprCoreType
.
} | ||
|
||
public int hashCode() { | ||
return 42 + exprCoreType.hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be considered a magic number that should be defined as a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe https://xkcd.com/221/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this override necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to make OpenSearchExprValueFactory::typeActionMap
work properly. Without that override it always falls to
Lines 208 to 210 in 5232ad2
throw new IllegalStateException( | |
String.format( | |
"Unsupported type: %s for value: %s.", type.typeName(), content.objectValue())); |
This could be simplified to always return 0 (or any other constant) to enforce equals
check always.
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Outdated
Show resolved
Hide resolved
|
||
## Solution | ||
|
||
The solution is to provide to `:core` non simplified types, but full types. Those objects should be fully compatible with `ExprCoreType` and implement all required APIs to allow `:core` to manipulate with built-in functions. Once those type objects are returned back to `:opensearch`, it can get all required information to build the correct search request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non simplified types: enum
full types: Objects
right?
Can we just say that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplified types: enum
full types: Objects
Before full types were converted to enums before passing from :opensearch
to :core
. With my changes full types are passed from :opensearch
to :core
, and :core
uses an API call to convert them to a enum value whatever it is needed (to pick proper function signature).
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Show resolved
Hide resolved
OpenSearchDataType.of(MappingType.GeoPoint)), | ||
() -> assertNotEquals(OpenSearchDataType.of(MappingType.GeoPoint), | ||
OpenSearchDataType.of(MappingType.Ip)), | ||
() -> assertEquals(OpenSearchDataType.of(STRING), OpenSearchDataType.of(STRING)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c for coverage
I had to add 4 tests to satisfy jacoco caprise for line 42 and 4 more tests for line 43
Lines 42 to 43 in 5232ad2
if (mappingType != null && other.mappingType != null) { | |
return mappingType.equals(other.mappingType) && exprCoreType.equals(other.exprCoreType); |
opensearch/src/test/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataTypeTest.java
Show resolved
Hide resolved
opensearch/src/test/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataTypeTest.java
Show resolved
Hide resolved
Signed-off-by: Yury-Fridlyand <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered creating one class hierarchy for types? Instead of ExprCoreTypes being enum values, make them classes and derive from each other as appropriate.
Singleton instances can still be used for types that do not have parameters, like ints, keyword, etc.
This would simplify a lot of the type comparison logic.
@@ -21,7 +20,8 @@ public interface ExprType { | |||
* Is compatible with other types. | |||
*/ | |||
default boolean isCompatible(ExprType other) { | |||
if (this.equals(other)) { | |||
// Do double direction check with `equals`, because a derived class may override it | |||
if (this.equals(other) || other.equals(this)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By definition, if this.equals(other)
then other.equals(this)
must be true.
Do we have ExprType
s for which this is necessary? If yes, the problem is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other
maybe an instance of OpenSearchDataType
, which has more complex comparison logic.
I have an idea how to fix it, will do soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in b04a92e.
@@ -36,8 +36,8 @@ public void test_numeric_data_types() throws IOException { | |||
schema("byte_number", "byte"), | |||
schema("double_number", "double"), | |||
schema("float_number", "float"), | |||
schema("half_float_number", "float"), | |||
schema("scaled_float_number", "double")); | |||
schema("half_float_number", "half_float"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caused by changes described in #299 (comment)
schema("object_value", "object"), | ||
schema("nested_value", "nested"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caused by changes described in #299 (comment)
@@ -56,19 +56,18 @@ public void typeof_opensearch_types() throws IOException { | |||
+ " | fields `double`, `long`, `integer`, `byte`, `short`, `float`, `half_float`, `scaled_float`", | |||
TEST_INDEX_DATATYPE_NUMERIC)); | |||
verifyDataRows(response, | |||
rows("DOUBLE", "LONG", "INTEGER", "BYTE", "SHORT", "FLOAT", "FLOAT", "DOUBLE")); | |||
rows("DOUBLE", "LONG", "INTEGER", "BYTE", "SHORT", "FLOAT", "HALF_FLOAT", "SCALED_FLOAT")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this relate to adding text
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caused by changes described in #299 (comment)
() -> assertEquals("TIMESTAMP", defaultDateType.typeName()), | ||
() -> assertEquals("TIME", timeDateType.typeName()), | ||
() -> assertEquals("DATE", dateDateType.typeName()), | ||
() -> assertEquals("DATE", datetimeDateType.typeName()) | ||
() -> assertEquals("TIMESTAMP", datetimeDateType.typeName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very unrelated to adding text
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before
OpenSearchDateType
converted to a simplified type when passed to:core
module. Actually,ExprCoreType
extracted from OSDT, it is stored inside. Values wereDATE
/TIME
/etc.ExprCoreType
names used to build schema inQueryResponse
, which was serialized later and sent to user.legacyTypeName
method ofExprType
used for SQL responses andtypeName
forPPL
ones.
In the middle
- OSDT isn't converted
- Same methods of OSDT return
mappingType
which is alwaysdate
regardless of detectedExprCoreType
for this field.
Finally
- -//-
OpenSearch
DateType
overrides these methods to returnExprCoreType
- No changes for a user!
/** | ||
* Perform field name conversion if needed before inserting it into a search query. | ||
*/ | ||
default String convertFieldForSearchQuery(String fieldName) { | ||
return fieldName; | ||
} | ||
|
||
/** | ||
* Perform value conversion if needed before inserting it into a search query. | ||
*/ | ||
default Object convertValueForSearchQuery(ExprValue value) { | ||
return value.value(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be more appropriate for these to be on OpenSearchDataType
since they are specific to how we communicate with OpenSearch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, where it is used, ExprType
is referenced. I'd like to avoid excessive refactoring there.
Lines 16 to 19 in 6c3744e
public class LikeQuery extends LuceneQuery { | |
@Override | |
public QueryBuilder doBuild(String fieldName, ExprType fieldType, ExprValue literal) { | |
String field = OpenSearchTextType.convertTextToKeyword(fieldName, fieldType); |
Any ideas how to do it gracefully?
} | ||
|
||
public int hashCode() { | ||
return 42 + exprCoreType.hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this override necessary?
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Show resolved
Hide resolved
@@ -163,8 +208,8 @@ public static OpenSearchDataType of(MappingType mappingType, Map<String, Object> | |||
case Ip: return OpenSearchIpType.of(); | |||
case Date: | |||
// Default date formatter is used when "" is passed as the second parameter | |||
String format = (String) innerMap.getOrDefault("format", ""); | |||
return OpenSearchDateType.of(format); | |||
return innerMap.isEmpty() ? OpenSearchDateType.of() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This simplifies creation of OpenSearchDateType
. A format string passes a number of checks even when it is empty.
return fieldName + ".keyword"; | ||
@Override | ||
public String convertFieldForSearchQuery(String fieldName) { | ||
if (fields.size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the user will end up with OpenSearch error about not being able to aggregate on text. Do I get that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
@acarbonetto |
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
text
type.text
type.
Description
See doc for technical details: https://github.com/Bit-Quill/opensearch-project-sql/blob/dev-add-text-type/docs/dev/text-type.md
See also this comment describing some changes.
Issues Resolved
OpenSearchDataType
) though:core
module instead of simplified ones (ExprCoreType
).This unblocks access to important mapping info such as text fields or date formats. This info is required to build proper DSL queries to OpenSearch.
keyword
subfield name.Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.