Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Opensearch returns null for all columns not listed in GROUP BY statement #258

Open
Tracked by #817
Yury-Fridlyand opened this issue Nov 2, 2021 · 4 comments
Assignees
Labels
tdvt Tableau test framework

Comments

@Yury-Fridlyand
Copy link
Collaborator

Describe the bug
Opensearch returns null for all columns not listed in GROUP BY statement

To Reproduce
Steps to reproduce the behavior:

  1. Upload the test data using attached files or generate on your own
curl -s -H 'Content-Type: application/x-ndjson' -XPUT 'https://localhost:9200/calcs?pretty' -u admin:admin --insecure --data-binary @calcs_mappings.json
curl -s -H 'Content-Type: application/x-ndjson' -XPOST 'https://localhost:9200/calcs/_bulk?pretty' -u admin:admin --insecure --data-binary @calcs.json | grep -i error

See below for data sample:

opensearchsql> select `key` as KK, int1, bool0 from calcs;
fetched rows / total rows = 17/17
+-------+--------+---------+
| KK    | int1   | bool0   |
|-------+--------+---------|
| key00 | -3     | True    |
| key01 | -6     | False   |
| key02 | null   | null    |
| key04 | null   | False   |
| key05 | null   | null    |
| key06 | null   | True    |
| key08 | 3      | null    |
| key09 | 3      | null    |
| key10 | null   | True    |
| key15 | null   | False   |
| key03 | -4     | True    |
| key07 | 2      | False   |
| key11 | -8     | False   |
| key12 | null   | null    |
| key13 | null   | null    |
| key14 | null   | True    |
| key16 | -9     | null    |
+-------+--------+---------+
  1. Execute a query with GROUP BY statement
opensearchsql> select `key` as KK, int1 from calcs GROUP BY bool0;
fetched rows / total rows = 3/3
+------+--------+
| KK   | int1   |
|------+--------|
| null | null   |
| null | null   |
| null | null   |
+------+--------+

Expected behavior
In this very specific query at least key column shouldn't be empty (filled by nulls)

Plugins
The default set:

curl "localhost:9200/_cat/plugins?h=component&pretty=true"
mapper-murmur3
opensearch-alerting
opensearch-anomaly-detection
opensearch-asynchronous-search
opensearch-index-management
opensearch-job-scheduler
opensearch-knn
opensearch-notebooks
opensearch-performance-analyzer
opensearch-reports-scheduler
opensearch-security
opensearch-sql

Host/Environment (please complete the following information):
Docker container v.1.0.0

@Yury-Fridlyand Yury-Fridlyand added Beta bug Something isn't working untriaged labels Nov 2, 2021
@chloe-zh
Copy link
Contributor

chloe-zh commented Nov 2, 2021

Hi @Yury-Fridlyand i think it's probably because the group by item was not present in the select items, could you try the following query:

select `key` as KK, int1, bool0 from calcs GROUP BY bool0

@chloe-zh chloe-zh removed Beta untriaged bug Something isn't working labels Nov 2, 2021
@Yury-Fridlyand
Copy link
Collaborator Author

Dear @chloe-zh,
Sorry for providing a mislead example. Even if column specified in GROUP BY statement added to the result set, other columns have only nulls:

opensearchsql> select `key` as KK, int1, bool0 from calcs GROUP BY bool0;
fetched rows / total rows = 3/3
+------+--------+---------+
| KK   | int1   | bool0   |
|------+--------+---------|
| null | null   | null    |
| null | null   | False   |
| null | null   | True    |
+------+--------+---------+

Actually, different SQL servers (and different versions) have different behavior. Usually, server reject query if it has a column without an aggregation function and not listed in GROUP BY. Some servers/versions produce multi-line output (like GROUP BY replaced by ORDER BY) or apply default aggregation function (e.g. CONCAT).
It should be discussed and defined what result should returned in such case. Returning nulls is also an option, but it must be proven/confirmed as a best one or most suitable one.

@Yury-Fridlyand
Copy link
Collaborator Author

If aggregation function is specified, OpenSearch returns a valid result:

opensearchsql> select COUNT(`key`) as KK, COUNT(int1), bool0 from calcs GROUP BY bool0;
fetched rows / total rows = 3/3
+------+---------------+---------+
| KK   | COUNT(int1)   | bool0   |
|------+---------------+---------|
| 7    | 3             | null    |
| 5    | 3             | False   |
| 5    | 2             | True    |
+------+---------------+---------+
opensearchsql> select COUNT(`key`) as KK, int1, bool0 from calcs GROUP BY bool0;
fetched rows / total rows = 3/3
+------+--------+---------+
| KK   | int1   | bool0   |
|------+--------+---------|
| 7    | null   | null    |
| 5    | null   | False   |
| 5    | null   | True    |
+------+--------+---------+

@Yury-Fridlyand
Copy link
Collaborator Author

mysql> select `key` as `KK`, `int1`, `bool0` from calcs GROUP BY `bool0`;
ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'TDVT.calcs.key' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

@dai-chen dai-chen added the v2.5.0 'Issues and PRs related to version v2.5.0' label Dec 14, 2022
@dai-chen dai-chen self-assigned this Dec 14, 2022
@dai-chen dai-chen removed the v2.5.0 'Issues and PRs related to version v2.5.0' label Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tdvt Tableau test framework
Projects
None yet
Development

No branches or pull requests

4 participants