Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Field name is converted to lowercase if present in GROUP BY #373

Closed
dai-chen opened this issue Mar 3, 2020 · 11 comments
Closed

Field name is converted to lowercase if present in GROUP BY #373

dai-chen opened this issue Mar 3, 2020 · 11 comments
Labels
bug Something isn't working

Comments

@dai-chen
Copy link
Member

dai-chen commented Mar 3, 2020

Field name with function is supposed to be exactly same as what it is in SELECT. We've already implemented for simple case. However, it is converted to lowercase if used in GROUP BY.

PUT date_test
{
  "mappings": {
    "properties": {
      "birthday": {
        "type": "date"
      }
    }
  }
}
PUT date_test/_doc/1
{
  "birthday": "2019-09-10T10:00:00Z"
}

# Query without GROUP BY returns field name as expected, "DATE_format".
POST _opendistro/_sql
{
  "query": "SELECT DATE_format(birthday, 'yyyy-MM-dd', 'UTC') FROM date_test"
}
{
  "schema": [{
    "name": "DATE_format(birthday, 'yyyy-MM-dd', 'UTC')",
    "type": "text"
  }],
  "total": 1,
  "datarows": [["2019-09-10"]],
  "size": 1,
  "status": 200
}

# Field name is converted to lowercase "date_format" for aggregation
POST _opendistro/_sql
{
  "query": """
    SELECT DATE_format(birthday, 'yyyy-MM-dd', 'UTC')
    FROM date_test
    GROUP BY DATE_FORMAT(birthday, 'yyyy-MM-dd', 'UTC') 
    ORDER BY DATE_FORMAT(birthday, 'yyyy-MM-dd', 'UTC')
  """
}
{
  "schema": [{
    "name": "date_format(birthday)",
    "type": "text"
  }],
  "total": 1,
  "datarows": [["2019-09-10"]],
  "size": 1,
  "status": 200
}
@dai-chen dai-chen added the bug Something isn't working label Mar 3, 2020
@chenqi0805
Copy link
Contributor

I can pick up the issue as a practice. But need some direction. Here is a regression test to reproduce the bug:

@Test
public void testAbsWithIntFieldGroupBy() {
        JSONObject response = executeJdbcRequest("SELECT ABS(age) FROM " + TestsConstants.TEST_INDEX_ACCOUNT +
                " GROUP BY ABS(age) LIMIT 5");

        verifySchema(response, schema("ABS(age)", null, "long"));
    }

Diving into executeJdbcRequest, not sure how opendistrosql plugin is in-play here.

@dai-chen
Copy link
Member Author

dai-chen commented Mar 9, 2020

@chenqi0805 Thanks! When you run ./gradlew build, there is an Elasticsearch cluster running in your workspace (started by Gradle task integTestRunner), our executeJdbcRequest() in base class SQLIntegTestCase connects to the cluster and get your response.

@chenqi0805
Copy link
Contributor

chenqi0805 commented Mar 9, 2020

@dai-chen Is there a way I could monitor how different modules in sql are functioning in processing a sql request? Right now I lose track of what is happening in client().performRequest.

@dai-chen
Copy link
Member Author

dai-chen commented Mar 9, 2020

@chenqi0805 yes, it is because the request is being sent to ES cluster in your workspace via restful client. You should be able to find ES log in build/ folder, for example build/cluster/integTestCluster node0/elasticsearch-7.4.2/logs/_integTestCluster.log. This is helpful if you debug your integration test code. And it should be output to console by gradle if you run by ./gradlew integTestRunner.

@chenqi0805
Copy link
Contributor

chenqi0805 commented Mar 10, 2020

After some deeper dive, the cause of the bug is from the normalization rewriteFunctionNameToLowerCase being applied to the query block(SQLToOperatorConverter.visit), which converts all methodNames into lower case including both expressions in select and groupby. And those names will propagate into ColumnNodes. Thus the original methodNames string are lost. Now the tricky case is as follows:

POST _opendistro/_sql
{
  "query": """
    SELECT Length(Name.keyword)
    FROM date_test
    GROUP BY LENGTH(Name.keyword)
  """
}

where without lowercase normalization expressions in SELECT will fail to match those from GROUP BY due to the logic in SQLMethodInvokeExpr.equals, which leads to failure in JdbcTestIT.

Do not have a good strategy to fix yet. @dai-chen Suggestion is welcome.

@dai-chen
Copy link
Member Author

@chenqi0805 Thanks a lot for your investigation!

@penghuo Could you provide some suggestion when you're available?

@penghuo
Copy link
Contributor

penghuo commented Mar 13, 2020

For issued case, One solution is before make all the method to lowercase, we can keep the original select field and pass it to SQLAggregationParser. https://github.com/opendistro-for-elasticsearch/sql/blob/master/src/main/java/com/amazon/opendistroforelasticsearch/sql/query/planner/converter/SQLAggregationParser.java#L157
Which could make sure the ColumnNode.name and ColumnNode.alias always have the original value.

@chenqi0805
Copy link
Contributor

@penghuo Thanks. I will give it a try and see how many existing integration tests it might break.

@chenqi0805
Copy link
Contributor

Looking closer, one needs to deal with nested expression, e.g. cos(SIN(x)). So a tree like data structure seems necessary in keeping track of original function names.

Also, a related bug is that extra args are omitted in the expression Name:

nameOfExpr(((SQLMethodInvokeExpr) expr).getParameters().get(0)));
, this also manifest itself in the schema part of the posted snippet.

@penghuo
Copy link
Contributor

penghuo commented Mar 13, 2020

Yes, nameOfExpr should be rewrite to keep the function name and parameters as expected.

@chenqi0805
Copy link
Contributor

@penghuo Will open a separate issue to keep the workflow clean. Might work on that if I have time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants