-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semantic_text as string type in ES|QL - support for functions and operators #115243
Conversation
@@ -123,7 +123,7 @@ protected static void bytesRefs( | |||
Function<DataType, DataType> expectedDataType, | |||
BiFunction<Integer, Stream<BytesRef>, Matcher<Object>> matcher | |||
) { | |||
for (DataType type : new DataType[] { DataType.KEYWORD, DataType.TEXT, DataType.IP, DataType.VERSION }) { | |||
for (DataType type : new DataType[] { DataType.KEYWORD, DataType.TEXT, DataType.IP, DataType.VERSION, DataType.SEMANTIC_TEXT }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the mv functions, I only modified the tests that explicitly tested for keyword/text fields to also add tests for semantic_text. For most of the mv function tests I did not need to make any changes because I modified the bytesRef
method in the parent abstract class.
mv_concat
and mv_zip
do not use bytesRef
, but use the DataType.isString
method to only test string types, so implicitly they tests semantic_text
:
Lines 34 to 44 in f32051f
for (DataType fieldType : DataType.types()) { | |
if (DataType.isString(fieldType) == false) { | |
continue; | |
} | |
for (DataType delimType : DataType.types()) { | |
if (DataType.isString(delimType) == false) { | |
continue; | |
} | |
for (int l = 1; l < 10; l++) { | |
int length = l; | |
suppliers.add(new TestCaseSupplier(fieldType + "/" + l + " " + delimType, List.of(fieldType, delimType), () -> { |
Lines 37 to 44 in f32051f
for (DataType leftType : DataType.types()) { | |
if (leftType != DataType.NULL && DataType.isString(leftType) == false) { | |
continue; | |
} | |
for (DataType rightType : DataType.types()) { | |
if (rightType != DataType.NULL && DataType.isString(rightType) == false) { | |
continue; | |
} |
the rest of the mv functions that take string params use bytesRef
and did not need modifications:
Line 33 in f32051f
bytesRefs(cases, "mv_count", "MvCount", t -> DataType.INTEGER, (size, values) -> equalTo(Math.toIntExact(values.count()))); |
Line 40 in f32051f
bytesRefs(cases, "mv_dedupe", "MvDedupe", (size, values) -> getMatcher(values)); |
Line 34 in f32051f
bytesRefs(cases, "mv_first", "MvFirst", Function.identity(), (size, values) -> equalTo(values.findFirst().get())); |
Line 34 in f32051f
bytesRefs(cases, "mv_last", "MvLast", Function.identity(), (size, values) -> equalTo(values.reduce((f, s) -> s).get())); |
Line 35 in f32051f
bytesRefs(cases, "mv_max", "MvMax", (size, values) -> equalTo(values.max(Comparator.naturalOrder()).get())); |
Line 35 in f32051f
bytesRefs(cases, "mv_min", "MvMin", (size, values) -> equalTo(values.min(Comparator.naturalOrder()).get())); |
@@ -217,8 +217,8 @@ public static Iterable<Object[]> parameters() { | |||
} | |||
|
|||
private static String typeErrorString = | |||
"boolean, cartesian_point, cartesian_shape, datetime, date_nanos, double, geo_point, geo_shape, integer, ip, keyword, long, text, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the comparison operators tests I did not need to make any other changes because they already use TestCase.stringCases
:
Lines 147 to 154 in f32051f
suppliers.addAll( | |
TestCaseSupplier.stringCases( | |
Object::equals, | |
(lhsType, rhsType) -> "EqualsKeywordsEvaluator[lhs=Attribute[channel=0], rhs=Attribute[channel=1]]", | |
List.of(), | |
DataType.BOOLEAN | |
) | |
); |
The same applies for the rest of the comparison predicates:
Lines 137 to 144 in f32051f
suppliers.addAll( | |
TestCaseSupplier.stringCases( | |
(l, r) -> ((BytesRef) l).compareTo((BytesRef) r) >= 0, | |
(lhsType, rhsType) -> "GreaterThanOrEqualKeywordsEvaluator[lhs=Attribute[channel=0], rhs=Attribute[channel=1]]", | |
List.of(), | |
DataType.BOOLEAN | |
) | |
); |
Lines 137 to 144 in f32051f
suppliers.addAll( | |
TestCaseSupplier.stringCases( | |
(l, r) -> ((BytesRef) l).compareTo((BytesRef) r) > 0, | |
(lhsType, rhsType) -> "GreaterThanKeywordsEvaluator[lhs=Attribute[channel=0], rhs=Attribute[channel=1]]", | |
List.of(), | |
DataType.BOOLEAN | |
) | |
); |
Lines 137 to 144 in f32051f
suppliers.addAll( | |
TestCaseSupplier.stringCases( | |
(l, r) -> ((BytesRef) l).compareTo((BytesRef) r) <= 0, | |
(lhsType, rhsType) -> "LessThanOrEqualKeywordsEvaluator[lhs=Attribute[channel=0], rhs=Attribute[channel=1]]", | |
List.of(), | |
DataType.BOOLEAN | |
) | |
); |
Lines 137 to 144 in f32051f
suppliers.addAll( | |
TestCaseSupplier.stringCases( | |
(l, r) -> ((BytesRef) l).compareTo((BytesRef) r) < 0, | |
(lhsType, rhsType) -> "LessThanKeywordsEvaluator[lhs=Attribute[channel=0], rhs=Attribute[channel=1]]", | |
List.of(), | |
DataType.BOOLEAN | |
) | |
); |
Lines 145 to 151 in f32051f
suppliers.addAll( | |
TestCaseSupplier.stringCases( | |
(l, r) -> false == l.equals(r), | |
(lhsType, rhsType) -> "NotEqualsKeywordsEvaluator[lhs=Attribute[channel=0], rhs=Attribute[channel=1]]", | |
List.of(), | |
DataType.BOOLEAN | |
) |
TestCaseSupplier
uses DataType.stringTypes()
which contains semantic_text
:
Lines 84 to 95 in f32051f
public static List<TestCaseSupplier> stringCases( | |
BinaryOperator<Object> expected, | |
BiFunction<DataType, DataType, String> evaluatorToString, | |
List<String> warnings, | |
DataType expectedType | |
) { | |
List<TypedDataSupplier> lhsSuppliers = new ArrayList<>(); | |
List<TypedDataSupplier> rhsSuppliers = new ArrayList<>(); | |
List<TestCaseSupplier> suppliers = new ArrayList<>(); | |
for (DataType type : DataType.stringTypes()) { | |
lhsSuppliers.addAll(stringCases(type)); | |
rhsSuppliers.addAll(stringCases(type)); |
@@ -173,3 +173,946 @@ host:keyword | semantic_text_field:semantic_text | |||
"host2" | all we have to decide is what to do with the time that is given to us | |||
"host3" | be excellent to each other | |||
; | |||
|
|||
case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for Conditional functions and expressions
3 | null | ||
; | ||
|
||
convertToBool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for Type conversion functions
3 | null | ||
; | ||
|
||
concat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for String functions
live long and prosper | ||
; | ||
|
||
mvAppend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for Multi value functions
3 | null | ||
; | ||
|
||
equalityWithConstant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for Binary operators - the ones that support string types
3 | null | ||
; | ||
|
||
isNull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for ES|QL predicates
@elasticmachine update branch |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts from someone who's been working on data types for the past couple of months. Hopefully adding the new type hasn't been too difficult. Please feel free to ping me directly if you have questions.
@@ -369,6 +370,9 @@ public static DataType commonType(DataType left, DataType right) { | |||
if (left == TEXT || right == TEXT) { | |||
return TEXT; | |||
} | |||
if (left == SEMANTIC_TEXT || right == SEMANTIC_TEXT) { | |||
return TEXT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if the two types are the same, we always return that type, so this is saying that only in case of mixed type operations, we expect SEMANTIC_TEXT
to cast to TEXT
. It seems a little odd to me that KEYWORD == SEMANTIC_TEXT
would have a common type of TEXT
, but maybe that's expected? If so, I think a comment would help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this so that the common type is KEYWORD
in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we shouldn't do the same for TEXT (ie. return KEYWORD). Not part of this PR, though, but worth considering.
...src/test/java/org/elasticsearch/xpack/esql/expression/function/AbstractFunctionTestCase.java
Outdated
Show resolved
Hide resolved
...test/java/org/elasticsearch/xpack/esql/expression/function/scalar/conditional/CaseTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM - great work on supporting this new data type! 💯
Left a question about the data type to use when combining with other types
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/type/EsqlDataTypeConverter.java
Show resolved
Hide resolved
@@ -99,7 +100,7 @@ private static TestCaseSupplier supplier(String name, DataType type, int length, | |||
|
|||
private static void add(List<TestCaseSupplier> suppliers, String name, int length, Supplier<String> valueSupplier) { | |||
Map<Integer, List<List<DataType>>> permutations = new HashMap<Integer, List<List<DataType>>>(); | |||
List<DataType> supportedDataTypes = List.of(DataType.KEYWORD, DataType.TEXT); | |||
List<DataType> supportedDataTypes = List.of(DataType.KEYWORD, DataType.TEXT, DataType.SEMANTIC_TEXT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List<DataType> supportedDataTypes = List.of(DataType.KEYWORD, DataType.TEXT, DataType.SEMANTIC_TEXT); | |
List<DataType> supportedDataTypes = List.of(DataType.stringTypes()); |
@craigtaverner do you have time to take a look at this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Although I'd prefer new lines containing SEMANTIC_TEXT to be inserted immediately after the similar TEXT lines instead of later in the code. Keeping SEMANTIC_TEXT close to TEXT feels better. This also applies to block of code.
@@ -369,6 +370,9 @@ public static DataType commonType(DataType left, DataType right) { | |||
if (left == TEXT || right == TEXT) { | |||
return TEXT; | |||
} | |||
if (left == SEMANTIC_TEXT || right == SEMANTIC_TEXT) { | |||
return TEXT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we shouldn't do the same for TEXT (ie. return KEYWORD). Not part of this PR, though, but worth considering.
@@ -43,7 +43,8 @@ public class NotEquals extends EsqlBinaryComparison implements Negatable<EsqlBin | |||
Map.entry(DataType.KEYWORD, NotEqualsKeywordsEvaluator.Factory::new), | |||
Map.entry(DataType.TEXT, NotEqualsKeywordsEvaluator.Factory::new), | |||
Map.entry(DataType.VERSION, NotEqualsKeywordsEvaluator.Factory::new), | |||
Map.entry(DataType.IP, NotEqualsKeywordsEvaluator.Factory::new) | |||
Map.entry(DataType.IP, NotEqualsKeywordsEvaluator.Factory::new), | |||
Map.entry(DataType.SEMANTIC_TEXT, NotEqualsKeywordsEvaluator.Factory::new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line should be after the TEXT line.
@@ -38,7 +38,8 @@ public class LessThanOrEqual extends EsqlBinaryComparison implements Negatable<E | |||
Map.entry(DataType.KEYWORD, LessThanOrEqualKeywordsEvaluator.Factory::new), | |||
Map.entry(DataType.TEXT, LessThanOrEqualKeywordsEvaluator.Factory::new), | |||
Map.entry(DataType.VERSION, LessThanOrEqualKeywordsEvaluator.Factory::new), | |||
Map.entry(DataType.IP, LessThanOrEqualKeywordsEvaluator.Factory::new) | |||
Map.entry(DataType.IP, LessThanOrEqualKeywordsEvaluator.Factory::new), | |||
Map.entry(DataType.SEMANTIC_TEXT, LessThanOrEqualKeywordsEvaluator.Factory::new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line should be after the TEXT line.
@elasticmachine update branch |
💔 Backport failedThe backport operation could not be completed due to the following error:
You can use sqren/backport to manually backport by running |
…rators (elastic#115243) * fix tests * Add CSV tests * Add function tests * Refactor tests * spotless * Use DataType.stringTypes() where possible * Add tests for conditional functions and expressions * Fix tests after merge * Reorder semantic_text evaluators and tests * Re-ordered two more places for SEMANTIC_TEXT after TEXT --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Craig Taverner <[email protected]>
…rators (elastic#115243) * fix tests * Add CSV tests * Add function tests * Refactor tests * spotless * Use DataType.stringTypes() where possible * Add tests for conditional functions and expressions * Fix tests after merge * Reorder semantic_text evaluators and tests * Re-ordered two more places for SEMANTIC_TEXT after TEXT --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Craig Taverner <[email protected]>
* semantic_text as string type in ES|QL - support for functions and operators (#115243) * fix tests * Add CSV tests * Add function tests * Refactor tests * spotless * Use DataType.stringTypes() where possible * Add tests for conditional functions and expressions * Fix tests after merge * Reorder semantic_text evaluators and tests * Re-ordered two more places for SEMANTIC_TEXT after TEXT --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Craig Taverner <[email protected]> * Fix release tests for semantic_text (#116202) --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Craig Taverner <[email protected]>
tracked in #115103
Related: #114334 that disallows functions to return
TEXT
type values. We need to pay attention to this one, it will likely be merged first.Description
This adds support for
semantic_text
as a string type in ES|QL and checks the support for functions and operators (https://www.elastic.co/guide/en/elasticsearch/reference/master/esql-functions-operators.html).The process was:
semantic_text
a string type:elasticsearch/x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java
Lines 372 to 377 in 167e55b
semantic_text
; fix any failuressemantic_text
ensure we have test coverage when the arguments are of typesemantic_text
Next
This PR takes care of only functions/operator support. Things that are not covered by tests here, but will be in a follow up:
BUCKET
docsIf preferred I can make the rest of the changes here to ensure the support for commands too, but I figured it might be easier to review this in 2 parts.