Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL describe command #646

Merged
merged 20 commits into from
Jun 28, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
unit test
Signed-off-by: Sean Kao <seankao@amazon.com>
  • Loading branch information
seankao-az committed Jun 22, 2022
commit dd6f87e8e21b42213e0146c14458f8dc28ebfd09
Original file line number Diff line number Diff line change
@@ -112,5 +112,25 @@ public void can_parse_simple_query_string_relevance_function() {
"SOURCE=test | WHERE simple_query_string([\"Tags\" ^ 1.5, Title, `Body` 4.2], 'query',"
+ "analyzer=keyword, quote_field_suffix=\".exact\", fuzzy_prefix_length = 4)"));
}

@Test
public void testDescribeCommandShouldPass() {
ParseTree tree = new PPLSyntaxParser().analyzeSyntax("describe t");
assertNotEquals(null, tree);
}

@Test
public void testDescribeFieldsCommandShouldPass() {
ParseTree tree = new PPLSyntaxParser().analyzeSyntax("describe t | fields a,b");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this example? It seems little odd because if someone wants a,b as result he would specify a,b instead of describing table and filtering again.

Fields command is to project menitoned columns from the result set. A plausible usecase could be describe t | fields 2, 3 which implies give me second and third column names.

Also if someone appends other commands to describe, what is the expected behavior. I am assuming we will be calculating on the result set provided by prior describe command.

Copy link
Collaborator Author

@seankao-az seankao-az Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming we will be calculating on the result set provided by prior describe command.

That's correct. When appending other commands to describe, the behavior is to query the metadata table, instead of the data table itself (as expected for the pipe syntax).

An example of the usage of fields can be seen here

The fields do not refer to the data table's fields, but the metadata table's fields, because the result set of the describe command is a metadata table. Here you can see the full list of such fields.

Copy link
Collaborator Author

@seankao-az seankao-az Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A plausible usecase could be describe t | fields 2, 3 which implies give me second and third column names.

Interestingly, we could support that using the following syntax:

describe t | where ORDINAL_POSITION=2 or ORDINAL_POSITION=3 | fields COLUMN_NAME

However, it doesn't quite work yet at the moment, due to type mismatch snippet 1, snippet 2. Also, most of the metadata are meaningless right now, including the order of the columns.

$ curl .... '{"query": "describe opensearch_dashboards_sample_data_flights | where ORDINAL_POSITION=0"}'

{
  "error": {
    "reason": "Invalid Query",
    "details": "= function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[STRING,STRING],[BOOLEAN,BOOLEAN],[TIMESTAMP,TIMESTAMP],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [STRING,INTEGER]",
    "type": "ExpressionEvaluationException"
  },
  "status": 400
}

$ curl .... '{"query": "describe opensearch_dashboards_sample_data_flights | where ORDINAL_POSITION=\"0\""}'

{
  "error": {
    "reason": "Invalid Query",
    "details": "invalid to get integerValue from value of type STRING",
    "type": "ExpressionEvaluationException"
  },
  "status": 400
}

assertNotEquals(null, tree);
}

@Test
public void testDescribeCommandWithSourceShouldFail() {
exceptionRule.expect(RuntimeException.class);
exceptionRule.expectMessage("Failed to parse query due to offending symbol");

new PPLSyntaxParser().analyzeSyntax("describe source=t");
}
}

Original file line number Diff line number Diff line change
@@ -30,6 +30,7 @@
import static org.opensearch.sql.ast.dsl.AstDSL.map;
import static org.opensearch.sql.ast.dsl.AstDSL.nullLiteral;
import static org.opensearch.sql.ast.dsl.AstDSL.parse;
import static org.opensearch.sql.ast.dsl.AstDSL.project;
import static org.opensearch.sql.ast.dsl.AstDSL.projectWithArg;
import static org.opensearch.sql.ast.dsl.AstDSL.qualifiedName;
import static org.opensearch.sql.ast.dsl.AstDSL.rareTopN;
@@ -38,13 +39,15 @@
import static org.opensearch.sql.ast.dsl.AstDSL.sort;
import static org.opensearch.sql.ast.dsl.AstDSL.span;
import static org.opensearch.sql.ast.dsl.AstDSL.stringLiteral;
import static org.opensearch.sql.utils.SystemIndexUtils.mappingTable;

import com.google.common.collect.ImmutableMap;
import org.junit.Ignore;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.ExpectedException;
import org.opensearch.sql.ast.Node;
import org.opensearch.sql.ast.expression.AllFields;
import org.opensearch.sql.ast.expression.DataType;
import org.opensearch.sql.ast.expression.Literal;
import org.opensearch.sql.ast.expression.SpanUnit;
@@ -447,12 +450,22 @@ public void testIndexName() {
relation("log.2020.04.20."),
compare("=", field("a"), intLiteral(1))
));
assertEqual("describe log.2020.04.20.",
project(
relation(mappingTable("log.2020.04.20.")),
AllFields.of()
));
}

@Test
public void testIdentifierAsIndexNameStartWithDot() {
assertEqual("source=.opensearch_dashboards",
relation(".opensearch_dashboards"));
assertEqual("describe .opensearch_dashboards",
project(
relation(mappingTable(".opensearch_dashboards")),
AllFields.of()
));
}

@Test
@@ -603,6 +616,16 @@ public void testKmeansCommandWithoutParameter() {
new Kmeans(relation("t"), ImmutableMap.of()));
}

@Test
public void testDescribeCommand() {
assertEqual("describe t",
project(
relation(mappingTable("t")),
AllFields.of()
)
);
}

@Test
public void test_fitRCFADCommand_withoutDataFormat() {
assertEqual("source=t | AD shingle_size=10 time_decay=0.0001 time_field='timestamp' "