Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Validate unique plan attribute names #110488

Merged
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
134a3e8
Unique output attribute names after optimization
alex-spies Jul 4, 2024
9d9c70f
Enforce unique row attribute names in verifier
alex-spies Jul 4, 2024
0d4e1df
Update docs/changelog/110488.yaml
alex-spies Jul 4, 2024
6f36c29
Add tests for grok, dissect, enrich
alex-spies Jul 5, 2024
cd48514
Add tests for keep
alex-spies Jul 5, 2024
3a5dab7
Make row consistent with other plans
alex-spies Jul 5, 2024
f71ef42
Update docs/changelog/110488.yaml
alex-spies Jul 5, 2024
42be4eb
Add test for drop, rename and stats
alex-spies Jul 5, 2024
2a0d630
Merge remote-tracking branch 'upstream/main' into validate-unique-pla…
alex-spies Jul 5, 2024
11da00c
Add test dataset with deeper field hierarchy
alex-spies Jul 9, 2024
0d6758e
Add hierarchical shadowing test for eval
alex-spies Jul 9, 2024
0e13eaf
Add hierarchical tests for drop, dissect
alex-spies Jul 9, 2024
878eb99
Add hierarchical tests for enrich
alex-spies Jul 9, 2024
e8609f2
Add hierarchical tests for grok, keep
alex-spies Jul 9, 2024
5b4be72
Add hierarchical tests for rename, row
alex-spies Jul 9, 2024
2f1778a
Add more extreme case for stats
alex-spies Jul 9, 2024
73a9736
Add new capability for this fix
alex-spies Jul 9, 2024
e3531ff
Merge remote-tracking branch 'upstream/main' into validate-unique-pla…
alex-spies Jul 9, 2024
2cb23f5
Fix EsRelation.equals, mutation in ResolveUnionTypes
alex-spies Jul 9, 2024
222698c
Merge remote-tracking branch 'upstream/main' into validate-unique-pla…
alex-spies Jul 10, 2024
1a648e4
Make union types use unique attribute names
alex-spies Jul 11, 2024
658bebc
Cleanup leftover
alex-spies Jul 11, 2024
e2760a5
Merge remote-tracking branch 'upstream/main' into validate-unique-pla…
alex-spies Jul 11, 2024
6ac0fa9
Revert "Cleanup leftover"
alex-spies Jul 15, 2024
0f3c274
Revert "Make union types use unique attribute names"
alex-spies Jul 15, 2024
091c099
Revert "Fix EsRelation.equals, mutation in ResolveUnionTypes"
alex-spies Jul 15, 2024
f478d59
More ENRICH tests with internal shadowing
alex-spies Jul 15, 2024
6f00534
More consistent test names
alex-spies Jul 15, 2024
535c954
More KEEP tests
alex-spies Jul 15, 2024
fb85e16
Update docs
alex-spies Jul 15, 2024
6f98cde
Add more tests
alex-spies Jul 15, 2024
fdcc40e
Improve doc wording
alex-spies Jul 15, 2024
3779900
Improve GROK docs
alex-spies Jul 15, 2024
07e9584
Merge remote-tracking branch 'upstream/main' into validate-unique-pla…
alex-spies Jul 15, 2024
32f76a0
Merge remote-tracking branch 'upstream/main' into validate-unique-pla…
alex-spies Jul 16, 2024
94737ff
Update RENAME docs and tests
alex-spies Jul 16, 2024
ea9b9a9
Avoid duplicate field attribs from union type res
alex-spies Jul 16, 2024
f5d9568
Fix leftovers
alex-spies Jul 16, 2024
a387165
Make tests deterministic
alex-spies Jul 16, 2024
233d68d
Fix rename shadowing docs
alex-spies Jul 16, 2024
d0723b0
Apply Liam's doc remarks
alex-spies Jul 16, 2024
fb17126
Don't describe KEEP precedence twice
alex-spies Jul 17, 2024
bc354f4
Merge remote-tracking branch 'upstream/main' into validate-unique-pla…
alex-spies Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/110488.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 110488
summary: "ESQL: Validate unique plan attribute names"
area: ES|QL
type: bug
issues:
- 110541
2 changes: 2 additions & 0 deletions docs/reference/esql/processing-commands/dissect.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ multiple values, `DISSECT` will process each value.

`pattern`::
A <<esql-dissect-patterns,dissect pattern>>.
In case a field name coincides with an existing column, the existing column is discarded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In case a field name coincides with an existing column, the existing column is discarded.
If a field name conflicts with an existing column, the existing column is discarded.

If a field name is used more than once, only the rightmost duplicate creates a column.

`<separator>`::
A string used as the separator between appended values, when using the <<esql-append-modifier,append modifier>>.
Expand Down
7 changes: 6 additions & 1 deletion docs/reference/esql/processing-commands/enrich.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,16 @@ name as the `match_field` defined in the <<esql-enrich-policy,enrich policy>>.
The enrich fields from the enrich index that are added to the result as new
columns. If a column with the same name as the enrich field already exists, the
existing column will be replaced by the new column. If not specified, each of
the enrich fields defined in the policy is added
the enrich fields defined in the policy is added.
If a column has the same name as the enrich field, it will be discarded unless
that field is given a new name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a column has the same name as the enrich field, it will be discarded unless
that field is given a new name.
A column with the same name as the enrich field will be discarded unless
the enrich field is renamed.


`new_nameX`::
Enables you to change the name of the column that's added for each of the enrich
fields. Defaults to the enrich field name.
If a column has the same name as the new name, it will be discarded.
If a name (new or original) occurs more than once, only the rightmost duplicate
creates a new column.

*Description*

Expand Down
4 changes: 3 additions & 1 deletion docs/reference/esql/processing-commands/eval.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ EVAL [column1 =] value1[, ..., [columnN =] valueN]

`columnX`::
The column name.
In case a column name coincides with an existing column, the existing column is discarded.
Copy link
Contributor

@leemthompo leemthompo Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In case a column name coincides with an existing column, the existing column is discarded.
If a column exists with the same name, the existing column is discarded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go for If a column with the same name already exists, the existing column is dropped. if that works for you.

If a column name is used more than once, only the rightmost duplicate creates a column.

`valueX`::
The value for the column. Can be a literal, an expression, or a
<<esql-functions,function>>.
<<esql-functions,function>>. Can use columns defined left of this one.

*Description*

Expand Down
15 changes: 15 additions & 0 deletions docs/reference/esql/processing-commands/grok.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ multiple values, `GROK` will process each value.

`pattern`::
A grok pattern.
In case a field name coincides with an existing column, the existing column is discarded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In case a field name coincides with an existing column, the existing column is discarded.
If a field name conflicts with an existing column, the existing column is discarded.

If a field name is used more than once, a multi-valued column will be created with one value
per each occurrence of the field name.

*Description*

Expand Down Expand Up @@ -67,4 +70,16 @@ include::{esql-specs}/docs.csv-spec[tag=grokWithToDatetime]
|===
include::{esql-specs}/docs.csv-spec[tag=grokWithToDatetime-result]
|===

In case a field name is used more than once, `GROK` creates a multi-valued
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In case a field name is used more than once, `GROK` creates a multi-valued
If a field name is used more than once, `GROK` creates a multi-valued

column:

[source.merge.styled,esql]
----
include::{esql-specs}/docs.csv-spec[tag=grokWithDuplicateFieldNames]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/docs.csv-spec[tag=grokWithDuplicateFieldNames-result]
|===
// end::examples[]
6 changes: 5 additions & 1 deletion docs/reference/esql/processing-commands/keep.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ KEEP columns

`columns`::
A comma-separated list of columns to keep. Supports wildcards.
In case a column name without wildcards occurs multiple times, the column is placed at the
position of the rightmost duplicate. The same is true if a column name matches multiple
wildcards. If a column name matches both a wildcard and also a column name without wildcards,
the position of the latter is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In case a column name without wildcards occurs multiple times, the column is placed at the
position of the rightmost duplicate. The same is true if a column name matches multiple
wildcards. If a column name matches both a wildcard and also a column name without wildcards,
the position of the latter is used.
If a column name without wildcards occurs multiple times, the column is placed at the
position of the rightmost duplicate. The same is true if a column name matches multiple
wildcards. If a column name matches both a wildcard and also a column name without wildcards,
the position of the latter is used.


*Description*

Expand All @@ -29,7 +33,7 @@ Fields are added in the order they appear. If one field matches multiple express
2. Partial wildcard expressions (for example: `fieldNam*`)
3. Wildcard only (`*`)

If a field matches two expressions with the same precedence, the right-most expression wins.
If a field matches two expressions with the same precedence, the rightmost expression wins.

Refer to the examples for illustrations of these precedence rules.

Expand Down
2 changes: 2 additions & 0 deletions docs/reference/esql/processing-commands/lookup.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ LOOKUP table ON match_field1[, match_field2, ...]

`table`::
The name of the `table` provided in the request to match.
If it has column names that conflict with existing columns, the table's columns replace the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If it has column names that conflict with existing columns, the table's columns replace the
If the table's column names conflict with existing columns, the table's columns replace the

existing ones.

`match_field`::
The fields in the input to match against the table.
Expand Down
5 changes: 4 additions & 1 deletion docs/reference/esql/processing-commands/rename.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ RENAME old_name1 AS new_name1[, ..., old_nameN AS new_nameN]
The name of a column you want to rename.

`new_nameX`::
The new name of the column.
The new name of the column. If it conflicts with an existing column name,
the existing column is replaced by the renamed column. If multiple columns
are renamed to the same name, all but the rightmost column with the same new
name are dropped.

*Description*

Expand Down
3 changes: 3 additions & 0 deletions docs/reference/esql/processing-commands/stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,15 @@ STATS [column1 =] expression1[, ..., [columnN =] expressionN]
`columnX`::
The name by which the aggregated value is returned. If omitted, the name is
equal to the corresponding expression (`expressionX`).
If multiple columns have the same name, all but the rightmost column with this
name will be ignored.

`expressionX`::
An expression that computes an aggregated value.

`grouping_expressionX`::
An expression that outputs the values to group by.
If its name coincides with one of the computed columns, that column will be ignored.

NOTE: Individual `null` values are skipped when computing aggregations.

Expand Down
1 change: 1 addition & 0 deletions docs/reference/esql/source-commands/row.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ ROW column1 = value1[, ..., columnN = valueN]

`columnX`::
The column name.
In case of duplicate column names, only the rightmost duplicate creates a column.

`valueX`::
The value for the column. Can be a literal, an expression, or a
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,8 @@ public class CsvTestsDataLoader {
"cartesian_multipolygons.csv"
);
private static final TestsDataset DISTANCES = new TestsDataset("distances", "mapping-distances.json", "distances.csv");

private static final TestsDataset K8S = new TestsDataset("k8s", "k8s-mappings.json", "k8s.csv", "k8s-settings.json", true);
private static final TestsDataset ADDRESSES = new TestsDataset("addresses", "mapping-addresses.json", "addresses.csv", null, true);

public static final Map<String, TestsDataset> CSV_DATASET_MAP = Map.ofEntries(
Map.entry(EMPLOYEES.indexName, EMPLOYEES),
Expand All @@ -121,7 +121,8 @@ public class CsvTestsDataLoader {
Map.entry(AIRPORT_CITY_BOUNDARIES.indexName, AIRPORT_CITY_BOUNDARIES),
Map.entry(CARTESIAN_MULTIPOLYGONS.indexName, CARTESIAN_MULTIPOLYGONS),
Map.entry(K8S.indexName, K8S),
Map.entry(DISTANCES.indexName, DISTANCES)
Map.entry(DISTANCES.indexName, DISTANCES),
Map.entry(ADDRESSES.indexName, ADDRESSES)
);

private static final EnrichConfig LANGUAGES_ENRICH = new EnrichConfig("languages_policy", "enrich-policy-languages.json");
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
street:keyword,number:keyword,zip_code:keyword,city.name:keyword,city.country.name:keyword,city.country.continent.name:keyword,city.country.continent.planet.name:keyword,city.country.continent.planet.galaxy:keyword
Keizersgracht,281,1016 ED,Amsterdam,Netherlands,Europe,Earth,Milky Way
Kearny St,88,CA 94108,San Francisco,United States of America,North America,Earth,Milky Way
Marunouchi,2-7-2,100-7014,Tokyo,Japan,Asia,Earth,Milky Way
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,19 @@ first_name:keyword | left:keyword | full_name:keyword | right:keyword | last_nam
Georgi | left | Georgi Facello | right | Facello
;

shadowingSubfields
FROM addresses
| KEEP city.country.continent.planet.name, city.country.name, city.name
| DISSECT city.name "%{city.country.continent.planet.name} %{?}"
| SORT city.name
;

city.country.name:keyword | city.name:keyword | city.country.continent.planet.name:keyword
Netherlands | Amsterdam | null
United States of America | San Francisco | San
Japan | Tokyo | null
;

shadowingSelf
FROM employees
| KEEP first_name, last_name
Expand All @@ -50,6 +63,18 @@ last_name:keyword | left:keyword | foo:keyword | middle:keyword | ri
Facello | left | Georgi1 Georgi2 Facello | middle | right | Georgi1 | Georgi2 | Facello
;

shadowingInternal
FROM employees
| KEEP first_name, last_name
| WHERE last_name == "Facello"
| EVAL name = concat(first_name, "1 ", last_name)
| DISSECT name "%{foo} %{foo}"
;

first_name:keyword | last_name:keyword | name:keyword | foo:keyword
Georgi | Facello | Georgi1 Facello | Facello
;


complexPattern
ROW a = "1953-01-23T12:15:00Z - some text - 127.0.0.1;"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,23 @@ ROW a = "1.2.3.4 [2023-01-23T12:15:00.000Z] Connected"
// end::grokWithEscape-result[]
;

grokWithDuplicateFieldNames
// tag::grokWithDuplicateFieldNames[]
FROM addresses
| KEEP city.name, zip_code
| GROK zip_code "%{WORD:zip_parts} %{WORD:zip_parts}"
// end::grokWithDuplicateFieldNames[]
| SORT city.name
;

// tag::grokWithDuplicateFieldNames-result[]
city.name:keyword | zip_code:keyword | zip_parts:keyword
Amsterdam | 1016 ED | ["1016", "ED"]
San Francisco | CA 94108 | ["CA", "94108"]
Tokyo | 100-7014 | null
// end::grokWithDuplicateFieldNames-result[]
;

basicDissect
// tag::basicDissect[]
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,51 @@ FROM employees | STATS COUNT(*), MIN(salary * 10), MAX(languages)| DROP `COUNT(
MIN(salary * 10):i | MAX(languages):i
253240 | 5
;

// Not really shadowing, but let's keep the name consistent with the other command's tests
shadowingInternal
FROM employees
| KEEP emp_no, first_name, last_name
| DROP last_name, last_name
| LIMIT 2
;

emp_no:integer | first_name:keyword
10001 | Georgi
10002 | Bezalel
;

shadowingInternalWildcard
FROM employees
| KEEP emp_no, first_name, last_name
| DROP last*name, last*name, last*, last_name
| LIMIT 2
;

emp_no:integer | first_name:keyword
10001 | Georgi
10002 | Bezalel
;

subfields
FROM addresses
| DROP city.country.continent.planet.name, city.country.continent.name, city.country.name, number, street, zip_code, city.country.continent.planet.name
| SORT city.name
;

city.country.continent.planet.galaxy:keyword | city.name:keyword
Milky Way | Amsterdam
Milky Way | San Francisco
Milky Way | Tokyo
;

subfieldsWildcard
FROM addresses
| DROP *.name, number, street, zip_code, *ame
;

city.country.continent.planet.galaxy:keyword
Milky Way
Milky Way
Milky Way
;
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,34 @@ ROW left = "left", foo = "foo", client_ip = "172.21.0.5", env = "env", right = "
left:keyword | client_ip:keyword | env:keyword | right:keyword | foo:keyword
;

shadowingSubfields
required_capability: enrich_load
FROM addresses
| KEEP city.country.continent.planet.name, city.country.name, city.name
| EVAL city.name = REPLACE(city.name, "San Francisco", "South San Francisco")
| ENRICH city_names ON city.name WITH city.country.continent.planet.name = airport
| SORT city.name
;

city.country.name:keyword | city.name:keyword | city.country.continent.planet.name:text
Netherlands | Amsterdam | null
United States of America | South San Francisco | San Francisco Int'l
Japan | Tokyo | null
;

shadowingSubfieldsLimit0
required_capability: enrich_load
FROM addresses
| KEEP city.country.continent.planet.name, city.country.name, city.name
| EVAL city.name = REPLACE(city.name, "San Francisco", "South San Francisco")
| ENRICH city_names ON city.name WITH city.country.continent.planet.name = airport
| SORT city.name
| LIMIT 0
;

city.country.name:keyword | city.name:keyword | city.country.continent.planet.name:text
;

shadowingSelf
required_capability: enrich_load
ROW left = "left", client_ip = "172.21.0.5", env = "env", right = "right"
Expand Down Expand Up @@ -107,6 +135,46 @@ ROW left = "left", airport = "Zurich Airport ZRH", city = "Zürich", middle = "m
left:keyword | city:keyword | middle:keyword | right:keyword | airport:text | region:text | city_boundary:geo_shape
;

shadowingInternal
required_capability: enrich_load
ROW city = "Zürich"
| ENRICH city_names ON city WITH x = airport, x = region
;

city:keyword | x:text
Zürich | Bezirk Zürich
;

shadowingInternalImplicit
required_capability: enrich_load
ROW city = "Zürich"
| ENRICH city_names ON city WITH airport = region
;

city:keyword | airport:text
Zürich | Bezirk Zürich
;

shadowingInternalImplicit2
required_capability: enrich_load
ROW city = "Zürich"
| ENRICH city_names ON city WITH airport, airport = region
;

city:keyword | airport:text
Zürich | Bezirk Zürich
;

shadowingInternalImplicit3
required_capability: enrich_load
ROW city = "Zürich"
| ENRICH city_names ON city WITH airport = region, airport
;

city:keyword | airport:text
Zürich | Zurich Int'l
;

simple
required_capability: enrich_load

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,19 @@ left:keyword | right:keyword | x:integer
left | right | 1
;

shadowingSubfields
FROM addresses
| KEEP city.country.continent.planet.name, city.country.name, city.name
| EVAL city.country.continent.planet.name = to_upper(city.country.continent.planet.name)
| SORT city.name
;

city.country.name:keyword | city.name:keyword | city.country.continent.planet.name:keyword
Netherlands | Amsterdam | EARTH
United States of America | San Francisco | EARTH
Japan | Tokyo | EARTH
;

shadowingSelf
ROW left = "left", x = 10000 , right = "right"
| EVAL x = x + 1
Expand All @@ -33,6 +46,16 @@ left:keyword | middle:keyword | right:keyword | x:integer | y:integer
left | middle | right | 9 | 10
;

shadowingInternal
ROW x = 10000
| EVAL x = x + 1, x = x - 2
;

x:integer
9999
;


withMath
row a = 1 | eval b = 2 + 3;

Expand Down
Loading