This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 186
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support SELECT * and FROM clause in new SQL parser (#573)
* Support from * Add more UT * Update doc * Update doc * Add doctest * Add IT * Change doc and grammar for ANSI SQL * Change doc and grammar * Split grammar file * Prepare PR * Prepare PR * Run IT with/without new engine * Address PR comments * Address PR comments
- Loading branch information
Showing
19 changed files
with
550 additions
and
53 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
=========== | ||
Identifiers | ||
=========== | ||
|
||
.. rubric:: Table of contents | ||
|
||
.. contents:: | ||
:local: | ||
:depth: 2 | ||
|
||
|
||
Introduction | ||
============ | ||
|
||
Identifiers are used for naming your database objects, such as index name, field name, alias etc. Basically there are two types of identifiers: regular identifiers and delimited identifiers. | ||
|
||
|
||
Regular Identifiers | ||
=================== | ||
|
||
Description | ||
----------- | ||
|
||
According to ANSI SQL standard, a regular identifier is a string of characters that must start with ASCII letter (lower or upper case). The subsequent character can be a combination of letter, digit, underscore (``_``). It cannot be a reversed key word. And whitespace and other special characters are not allowed. Additionally in our SQL parser, we make extension to the rule for Elasticsearch storage as shown in next sub-section. | ||
|
||
Extensions | ||
---------- | ||
|
||
For Elasticsearch, the following identifiers are supported extensionally by our SQL parser for convenience (without the need of being delimited as shown in next section): | ||
|
||
1. Identifiers prefixed by dot ``.``: this is called hidden index in Elasticsearch, for example ``.kibana``. | ||
2. Identifiers prefixed by at sign ``@``: this is common for meta fields generated in Logstash ingestion. | ||
3. Identifiers with ``-`` in the middle: this is mostly the case for index name with date information. | ||
4. Identifiers with star ``*`` present: this is mostly an index pattern for wildcard match. | ||
|
||
Examples | ||
-------- | ||
|
||
Here are examples for using index pattern directly without quotes:: | ||
|
||
od> SELECT * FROM *cc*nt*; | ||
fetched rows / total rows = 4/4 | ||
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+ | ||
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname | | ||
|------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------| | ||
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | [email protected] | Duke | | ||
| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | [email protected] | Bond | | ||
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates | | ||
| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | [email protected] | Adams | | ||
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+ | ||
|
||
|
||
Delimited Identifiers | ||
===================== | ||
|
||
Description | ||
----------- | ||
|
||
A delimited identifier is an identifier enclosed in back ticks ````` or double quotation marks ``"``. In this case, the identifier enclosed is not necessarily a regular identifier. In other words, it can contain any special character not allowed by regular identifier. | ||
|
||
Please note the difference between single quote and double quotes in SQL syntax. Single quote is used to enclose a string literal while double quotes have same purpose as back ticks to escape special characters in an identifier. | ||
|
||
Use Cases | ||
--------- | ||
|
||
Here are typical examples of the use of delimited identifiers: | ||
|
||
1. Identifiers of reserved key word name | ||
2. Identifiers with dot ``.`` present: similarly as ``-`` in index name to include date information, it is required to be quoted so parser can differentiate it from identifier with qualifiers. | ||
3. Identifiers with other special character: Elasticsearch has its own rule which allows more special character, for example Unicode character is supported in index name. | ||
|
||
Examples | ||
-------- | ||
|
||
Here are examples for quoting an index name by back ticks:: | ||
|
||
od> SELECT * FROM `accounts`; | ||
fetched rows / total rows = 4/4 | ||
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+ | ||
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname | | ||
|------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------| | ||
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | [email protected] | Duke | | ||
| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | [email protected] | Bond | | ||
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates | | ||
| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | [email protected] | Adams | | ||
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+ | ||
|
||
|
||
Case Sensitivity | ||
================ | ||
|
||
Description | ||
----------- | ||
|
||
In SQL-92, regular identifiers are case insensitive and converted to upper case automatically just like key word. While characters in a delimited identifier appear as they are. However, in our SQL implementation, identifiers are treated in case sensitive manner. So it must be exactly same as what is stored in Elasticsearch which is different from ANSI standard. | ||
|
||
Examples | ||
-------- | ||
|
||
For example, if you run ``SELECT * FROM ACCOUNTS``, it will end up with an index not found exception from our plugin because the actual index name is under lower case. | ||
|
||
|
||
Identifier Qualifiers | ||
===================== | ||
|
||
For now, we do not support using Elasticsearch cluster name as catalog name to qualify an index name, such as ``my-cluster.logs``. | ||
|
||
TODO: field name qualifiers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
98 changes: 98 additions & 0 deletions
98
integ-test/src/test/java/com/amazon/opendistroforelasticsearch/sql/sql/IdentifierIT.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
/* | ||
* Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"). | ||
* You may not use this file except in compliance with the License. | ||
* A copy of the License is located at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* or in the "license" file accompanying this file. This file is distributed | ||
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either | ||
* express or implied. See the License for the specific language governing | ||
* permissions and limitations under the License. | ||
* | ||
*/ | ||
|
||
package com.amazon.opendistroforelasticsearch.sql.sql; | ||
|
||
import static com.amazon.opendistroforelasticsearch.sql.util.TestUtils.createHiddenIndexByRestClient; | ||
import static com.amazon.opendistroforelasticsearch.sql.util.TestUtils.performRequest; | ||
|
||
import com.amazon.opendistroforelasticsearch.sql.legacy.SQLIntegTestCase; | ||
import java.io.IOException; | ||
import org.elasticsearch.client.Request; | ||
import org.junit.jupiter.api.Test; | ||
|
||
/** | ||
* Integration tests for identifiers including index and field name symbol. | ||
*/ | ||
public class IdentifierIT extends SQLIntegTestCase { | ||
|
||
@Test | ||
public void testIndexNames() throws IOException { | ||
createIndexWithOneDoc("logs", "logs_2020_01"); | ||
queryAndAssertTheDoc("SELECT * FROM logs"); | ||
queryAndAssertTheDoc("SELECT * FROM logs_2020_01"); | ||
} | ||
|
||
@Test | ||
public void testSpecialIndexNames() throws IOException { | ||
createIndexWithOneDoc(".system", "logs-2020-01"); | ||
queryAndAssertTheDoc("SELECT * FROM .system"); | ||
queryAndAssertTheDoc("SELECT * FROM logs-2020-01"); | ||
} | ||
|
||
@Test | ||
public void testQuotedIndexNames() throws IOException { | ||
createIndexWithOneDoc("logs+2020+01", "logs.2020.01"); | ||
queryAndAssertTheDoc("SELECT * FROM `logs+2020+01`"); | ||
queryAndAssertTheDoc("SELECT * FROM \"logs.2020.01\""); | ||
} | ||
|
||
private void createIndexWithOneDoc(String... indexNames) throws IOException { | ||
for (String indexName : indexNames) { | ||
new Index(indexName).addDoc("{\"age\": 30}"); | ||
} | ||
} | ||
|
||
private void queryAndAssertTheDoc(String sql) { | ||
assertEquals( | ||
"{\n" | ||
+ " \"schema\": [{\n" | ||
+ " \"name\": \"age\",\n" | ||
+ " \"type\": \"integer\"\n" | ||
+ " }],\n" | ||
+ " \"total\": 1,\n" | ||
+ " \"datarows\": [[30]],\n" | ||
+ " \"size\": 1\n" | ||
+ "}\n", | ||
executeQuery(sql.replace("\"", "\\\""), "jdbc") | ||
); | ||
} | ||
|
||
/** | ||
* Index abstraction for test code readability. | ||
*/ | ||
private static class Index { | ||
|
||
private final String indexName; | ||
|
||
Index(String indexName) throws IOException { | ||
this.indexName = indexName; | ||
|
||
if (indexName.startsWith(".")) { | ||
createHiddenIndexByRestClient(client(), indexName, ""); | ||
} else { | ||
executeRequest(new Request("PUT", "/" + indexName)); | ||
} | ||
} | ||
|
||
void addDoc(String doc) { | ||
Request indexDoc = new Request("POST", String.format("/%s/_doc?refresh=true", indexName)); | ||
indexDoc.setJsonEntity(doc); | ||
performRequest(client(), indexDoc); | ||
} | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,4 @@ | ||
SELECT 1 + 2 FROM kibana_sample_data_flights | ||
SELECT abs(-10) FROM kibana_sample_data_flights | ||
SELECT DistanceMiles FROM kibana_sample_data_flights | ||
SELECT AvgTicketPrice, Carrier FROM kibana_sample_data_flights WHERE AvgTicketPrice <= 500 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
/* | ||
MySQL (Positive Technologies) grammar | ||
The MIT License (MIT). | ||
Copyright (c) 2015-2017, Ivan Kochurkin ([email protected]), Positive Technologies. | ||
Copyright (c) 2017, Ivan Khudyashev ([email protected]) | ||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
The above copyright notice and this permission notice shall be included in | ||
all copies or substantial portions of the Software. | ||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
THE SOFTWARE. | ||
*/ | ||
|
||
parser grammar OpenDistroSQLIdentifierParser; | ||
|
||
options { tokenVocab=OpenDistroSQLLexer; } | ||
|
||
|
||
// Identifiers | ||
|
||
tableName | ||
: qualifiedName | ||
; | ||
|
||
qualifiedName | ||
: ident (DOT ident)* | ||
; | ||
|
||
ident | ||
: DOT? ID | ||
| DOUBLE_QUOTE_ID | ||
| BACKTICK_QUOTE_ID | ||
; |
Oops, something went wrong.