Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GROUP BY Missing field returns error #743

Closed
deathjoin opened this issue Aug 10, 2022 · 6 comments · Fixed by #1191
Closed

[BUG] GROUP BY Missing field returns error #743

deathjoin opened this issue Aug 10, 2022 · 6 comments · Fixed by #1191
Assignees
Labels
enhancement New feature or request legacy Issues related to legacy query engine to be deprecated SQL v2.5.0 'Issues and PRs related to version v2.5.0'

Comments

@deathjoin
Copy link

What is the bug?
When doing GROUP BY field that is missing in some documents OS returns error

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create index
PUT test-nulls
  1. Put some docs one with missing field
POST _bulk
{ "create": { "_index": "test-nulls", "_id": "1" } }
{ "type": "first", "message": "Hello" }
{ "create": { "_index": "test-nulls", "_id": "2" } }
{ "type": "first", "message": "Hello" }
{ "create": { "_index": "test-nulls", "_id": "3" } }
{ "type": "second", "message": "World" }
{ "create": { "_index": "test-nulls", "_id": "4" } }
{ "message": "Whatever" }
  1. Run SQL (fetch_size isn't important)
POST /_plugins/_sql
{
    "query": "SELECT type, COUNT(*) FROM test-nulls GROUP BY type"
    "fetch_size": 2000
}
  1. See error
{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "invalid value operation on MISSING_VALUE",
    "type": "IllegalStateException"
  },
  "status": 500
}

What is the expected behavior?
I expect missing field would be treated like null and response would be

{
  "columns": [
    { "name": "type", "type": "text" },
    { "name": "COUNT(*)", "type": "long" }
  ],
  "rows": [
    [null, 1],
    ["first", 2],
    ["second", 1]
  ]
}

What is your host/environment?

  • OS: MacOS (Docker compose example from official docs)
  • Version Opensearch 2.1.0
  • Plugins default

Do you have any screenshots?
No

Do you have any additional context?
When selecting without group by value returned as null:

POST /_plugins/_sql
{
    "query": """
SELECT type,message FROM test-nulls
    """
}

Response:

{
  "schema": [
    {
      "name": "type",
      "type": "text"
    },
    {
      "name": "message",
      "type": "text"
    }
  ],
  "total": 4,
  "datarows": [
    [
      "first",
      "Hello"
    ],
    [
      "first",
      "Hello"
    ],
    [
      "second",
      "World"
    ],
    [
      null,
      "Whatever"
    ]
  ],
  "size": 4,
  "status": 200
}

Also node logs:

[2022-08-10T13:22:45,770][INFO ][o.o.s.l.p.RestSqlAction  ] [opensearch-node1] [c71f8fb0-9e9e-45f6-9a84-28ee0a5c4ab2] Incoming request /_plugins/_sql?pretty=true: ( SELECT identifier, COUNT(*) FROM table GROUP BY identifier )
[2022-08-10T13:22:45,818][WARN ][stderr                   ] [opensearch-node1] line 2:7 mismatched input 'type' expecting {'ALL', 'CASE', 'CAST', 'DISTINCT', 'FALSE', 'FIRST', 'LAST', 'LEFT', 'MATCH', 'NOT', 'NULL', 'RIGHT', 'TRUE', 'AVG', 'COUNT', 'MAX', 'MIN', 'SUM', 'VAR_POP', 'VAR_SAMP', 'VARIANCE', 'STD', 'STDDEV', 'STDDEV_POP', 'STDDEV_SAMP', 'SUBSTRING', 'TRIM', 'FULL', 'INTERVAL', 'MICROSECOND', 'SECOND', 'MINUTE', 'HOUR', 'DAY', 'WEEK', 'MONTH', 'QUARTER', 'YEAR', 'ABS', 'ACOS', 'ASCII', 'ASIN', 'ATAN', 'ATAN2', 'CEIL', 'CEILING', 'CONCAT', 'CONCAT_WS', 'CONV', 'COS', 'COT', 'CRC32', 'DATE', 'DATE_FORMAT', 'DATE_ADD', 'DATE_SUB', 'DAYOFMONTH', 'DAYOFWEEK', 'DAYOFYEAR', 'DAYNAME', 'DEGREES', 'E', 'EXP', 'FLOOR', 'FROM_DAYS', 'IF', 'IFNULL', 'ISNULL', 'LENGTH', 'LN', 'LOCATE', 'LOG', 'LOG10', 'LOG2', 'LOWER', 'LTRIM', 'MONTHNAME', 'NULLIF', 'PI', 'POW', 'POWER', 'RADIANS', 'RAND', 'REPLACE', 'ROUND', 'RTRIM', 'SIGN', 'SIN', 'SQRT', 'SUBDATE', 'TAN', 'TIME', 'TIME_TO_SEC', 'TIMESTAMP', 'TRUNCATE', 'TO_DAYS', 'UPPER', 'D', 'T', 'TS', 'DENSE_RANK', 'RANK', 'ROW_NUMBER', 'FIELD', 'MATCHPHRASE', 'MATCH_PHRASE', 'SIMPLE_QUERY_STRING', 'MULTI_MATCH', 'SUBSTR', 'STRCMP', 'ADDDATE', '*', '+', '-', 'MOD', '.', '(', '0', '1', '2', STRING_LITERAL, DECIMAL_LITERAL, REAL_LITERAL, ID, DOUBLE_QUOTE_ID, BACKTICK_QUOTE_ID}
[2022-08-10T13:22:45,833][WARN ][o.o.s.l.e.f.PrettyFormatRestExecutor] [opensearch-node1] Error happened in pretty formatter
java.lang.IllegalStateException: invalid value operation on MISSING_VALUE
	at org.opensearch.sql.legacy.expression.model.ExprValue.value(ExprValue.java:14) ~[legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.executor.format.BindingTupleResultSet.lambda$buildDataRows$1(BindingTupleResultSet.java:49) ~[legacy-2.1.0.0.jar:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
	at org.opensearch.sql.legacy.executor.format.BindingTupleResultSet.buildDataRows(BindingTupleResultSet.java:56) ~[legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.executor.format.BindingTupleResultSet.<init>(BindingTupleResultSet.java:28) ~[legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.executor.format.Protocol.loadResultSet(Protocol.java:87) ~[legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.executor.format.Protocol.<init>(Protocol.java:62) ~[legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.executor.format.PrettyFormatRestExecutor.execute(PrettyFormatRestExecutor.java:70) [legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.executor.format.PrettyFormatRestExecutor.execute(PrettyFormatRestExecutor.java:43) [legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.executor.AsyncRestExecutor.doExecuteWithTimeMeasured(AsyncRestExecutor.java:150) [legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.executor.AsyncRestExecutor.lambda$async$1(AsyncRestExecutor.java:110) [legacy-2.1.0.0.jar:?]
	at org.opensearch.sql.legacy.utils.LogUtils.lambda$withCurrentContext$1(LogUtils.java:59) [legacy-2.1.0.0.jar:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:739) [opensearch-2.1.0.jar:2.1.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
@deathjoin deathjoin added bug Something isn't working untriaged labels Aug 10, 2022
@dai-chen
Copy link
Collaborator

@deathjoin Thanks for reporting the issue. Will investigate and update!

@dai-chen
Copy link
Collaborator

After investigation, the root cause turns out that TYPE is reversed keyword in relevancy search function argument name: https://github.com/opensearch-project/sql/blob/main/sql/src/main/antlr/OpenSearchSQLParser.g4#L434. There are 2 workarounds as below:

Workaround-1: Use other field name

PUT test-nulls

POST _bulk
{ "create": { "_index": "test-nulls", "_id": "1" } }
{ "types": "first", "message": "Hello" }
{ "create": { "_index": "test-nulls", "_id": "2" } }
{ "types": "first", "message": "Hello" }
{ "create": { "_index": "test-nulls", "_id": "3" } }
{ "types": "second", "message": "World" }
{ "create": { "_index": "test-nulls", "_id": "4" } }
{ "message": "Whatever" }

POST _plugins/_sql
{
    "query": "SELECT types, COUNT(1) FROM test-nulls GROUP BY types"
}

{
  "schema": [
    {
      "name": "types",
      "type": "text"
    },
    {
      "name": "COUNT(1)",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      null,
      1
    ],
    [
      "first",
      2
    ],
    [
      "second",
      1
    ]
  ],
  "total": 3,
  "size": 3,
  "status": 200
}

Workaround-2: Quote keyword field name

POST _plugins/_sql
{
    "query": "SELECT `type`, COUNT(*) FROM test-nulls GROUP BY `type`"
}

{
  "schema": [
    {
      "name": "type",
      "type": "text"
    },
    {
      "name": "COUNT(*)",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      null,
      1
    ],
    [
      "first",
      2
    ],
    [
      "second",
      1
    ]
  ],
  "total": 3,
  "size": 3,
  "status": 200
}

@dai-chen
Copy link
Collaborator

dai-chen commented Aug 15, 2022

For permanent improvement, I think common keywords like TYPE should be allowed to use as identifier. This can be done by adding it here: https://github.com/opensearch-project/sql/blob/main/sql/src/main/antlr/OpenSearchSQLIdentifierParser.g4#L60.

@dai-chen dai-chen added enhancement New feature or request SQL v2.3.0 'Issues and PRs related to version v2.3.0' and removed bug Something isn't working untriaged labels Aug 15, 2022
@Yury-Fridlyand
Copy link
Collaborator

We have another bug for this: #712.
This should be fixed in PPL too, the fix is pretty simple.

@dai-chen dai-chen added the legacy Issues related to legacy query engine to be deprecated label Aug 23, 2022
@gaiksaya gaiksaya mentioned this issue Sep 14, 2022
22 tasks
@bbarani bbarani added v2.4.0 'Issues and PRs related to version v2.4.0' and removed v2.3.0 'Issues and PRs related to version v2.3.0' labels Oct 20, 2022
@bbarani
Copy link
Member

bbarani commented Oct 20, 2022

@dai-chen @Yury-Fridlyand Re-tagging this issue to 2.4.0 release.

@dai-chen dai-chen removed the v2.4.0 'Issues and PRs related to version v2.4.0' label Oct 28, 2022
@dai-chen
Copy link
Collaborator

Removing 2.4.0 tag. As @Yury-Fridlyand pointed out, this may be fixed along with other bug together.

@dai-chen dai-chen added the v2.5.0 'Issues and PRs related to version v2.5.0' label Dec 15, 2022
@dai-chen dai-chen self-assigned this Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request legacy Issues related to legacy query engine to be deprecated SQL v2.5.0 'Issues and PRs related to version v2.5.0'
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants