-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QL: constant_keyword support #53241
QL: constant_keyword support #53241
Conversation
Pinging @elastic/es-search (:Search/SQL) |
@elasticmachine run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Left some comments.
// To mute tests follow example in file: example.csv-spec | ||
|
||
// | ||
// Tests testing field alias (introduced in ES 6.4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover comment?
@@ -213,6 +214,7 @@ protected Object unwrapMultiValue(Object values) { | |||
protected boolean isFromDocValuesOnly(DataType dataType) { | |||
return dataType == KEYWORD // because of ignore_above. | |||
|| dataType == DATETIME | |||
|| dataType == CONSTANT_KEYWORD // because a non-existent value is considered the constant value itself |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain please, I don't get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot extract the value of a constant_keyword from _source because if there is no source for the field, Elasticsearch will still consider the field as having the constant value.
Take a look at the example in our docs. No value at indexing time means level
is debug
. So, for the second document in the sample, there is no _source for level
, but the value is still debug
in case there is a query filtering on that field.
@@ -198,6 +195,9 @@ private static void loadEmpDatasetIntoEs(RestClient client, String index, String | |||
} | |||
hadLastItem = true; | |||
bulk.append('"').append(titles.get(f)).append("\":\"").append(fields.get(f)).append('"'); | |||
if (titles.get(f).equals("gender") && extraFields) { | |||
bulk.append(",\"extra_gender\":\"M\""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would prefer it to be in tha employees.csv file and have a different value from gender
like Male
and Female
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo, it's a too impactful change to do it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I mixed up the feature with an enum keyword that can take a set of predefined values in the mapping. It's ok as is, but maybe check if null
is also acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe having is a Male
instead of M
can help avoid confusion in the future regarding tests?
; | ||
|
||
aggWithNullFilter | ||
SELECT COUNT(*) count FROM test_emp_copy WHERE extra_gender IS NOT NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constant_keyword cannot be null, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, I think it can be null... haven't tested this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess any type can be null
(missing field in the doc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For null
case, all the documents in the index, for that field, shouldn't have a value. As soon as one document is indexed with something in it for that field, all other documents will "inherit" that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; | ||
|
||
functionOverAlias | ||
SELECT BIT_LENGTH(extra_gender) bit FROM test_emp_copy ORDER BY extra_gender LIMIT 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have 2 values Male/Female that should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A constant_keyword field has only one value.
@@ -631,6 +632,7 @@ public void testCommonType() { | |||
assertEquals(BOOLEAN, commonType(BOOLEAN, BOOLEAN)); | |||
assertEquals(NULL, commonType(NULL, NULL)); | |||
assertEquals(INTEGER, commonType(INTEGER, KEYWORD)); | |||
assertEquals(DOUBLE, commonType(DOUBLE, CONSTANT_KEYWORD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numeric and constant_keyword not tasted in DataTypeConverterTests of QL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great set of tests. Left some comments on removal of the types in case of conflict since I think it might remove index information leading to incorrect mapping.
@@ -344,6 +351,11 @@ public static IndexResolution mergedMappings(DataTypeRegistry typeRegistry, Stri | |||
} | |||
} | |||
|
|||
// if there are both a keyword and a constant_keyword type for this field, only keep the keyword as a common compatible type | |||
if (hasCompatibleKeywords) { | |||
types.remove(CONSTANT_KEYWORD.esType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the type all together might remove useful information.
For example if a field a
is mapped as keyword in index X
and constant_keyword in index Y
, the piece above will find the collision and remove the field from Y
resulting in a mapping only in X
.
On one hand it does indicate that there's only field a
however each field has index information underneath so in this case it would be just for index X
. I'm not sure though whether we take that into account or not.
What happens is there's no removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario is the one where two indices have the same named field with both keyword
and constant_keyword
types and their mappings need to be merged. For other scenarios (where the same name is used for two fields with different types) we tell the user that the field cannot be shown (because of the different mapping types), but in this very specific case and since these two field types are so similar I chose to display the field with the "common" data type as keyword
.
The controversy for what to do in this specific case came from the scenario described here and is specifically tested here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/type/DataTypeConverter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thank you @astefan!
(cherry picked from commit d6cd4ce)
This PR adds support for
constant_keyword
data type following its addition in Elasticsearch with #49713.Addresses #53016