Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Underscore in identifier is ignored by ANTLR parser #1133

Closed
dai-chen opened this issue Dec 1, 2022 · 6 comments
Closed

[BUG] Underscore in identifier is ignored by ANTLR parser #1133

dai-chen opened this issue Dec 1, 2022 · 6 comments
Assignees
Labels
bug Something isn't working legacy Issues related to legacy query engine to be deprecated SQL v2.7.0

Comments

@dai-chen
Copy link
Collaborator

dai-chen commented Dec 1, 2022

What is the bug?

For some reason, underscore in a query is ignored by ANTLR parser. Take the query in our doc for example, _id becomes id after parsing. This caused no fallback happen and failure in v2 engine since metadata field is not supported yet.

How can one reproduce the bug?

PUT sql-test/_doc/1
{ 
  "name": "John",
  "age": 10
}

POST _plugins/_sql
{
  "query" : """
    SELECT * FROM sql-test WHERE _id = 1
  """
}

org.opensearch.sql.exception.SemanticCheckException: can't resolve Symbol(namespace=FIELD_NAME, name=id) in type env
	at org.opensearch.sql.analysis.TypeEnvironment.resolve(TypeEnvironment.java:57) ~[core-2.4.0.0-SNAPSHOT.jar:?]
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitIdentifier(ExpressionAnalyzer.java:323) ~[core-2.4.0.0-SNAPSHOT.jar:?]
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitQualifiedName(ExpressionAnalyzer.java:297) ~[core-2.4.0.0-SNAPSHOT.jar:?]
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitQualifiedName(ExpressionAnalyzer.java:73) ~[core-2.4.0.0-SNAPSHOT.jar:?]
	at org.opensearch.sql.ast.expression.QualifiedName.accept(QualifiedName.java:111) ~[core-2.4.0.0-SNAPSHOT.jar:?]
	at org.opensearch.sql.analysis.ExpressionAnalyzer.analyze(ExpressionAnalyzer.java:90) ~[core-2.4.0.0-SNAPSHOT.jar:?]

What is the expected behavior?

User can use _id without issue for backward compatibility.

What is your host/environment?

  • Version 2.4.0

Do you have any additional context?

Not clear when is this broken. It is missed because all 5 IT on it are ignored. Because it's used in our documentation, we'd better fix it or the doc soon.

@dai-chen dai-chen added bug Something isn't working SQL labels Dec 1, 2022
@MitchellGale
Copy link
Contributor

I think these might be related issues:
#639
#783
#339

@dai-chen
Copy link
Collaborator Author

dai-chen commented Dec 2, 2022

@MitchellGale-BitQuill Thanks for the info! It's strange that the example query has been used in documentation since 1.0. However, I tested it and found same problem with 1.0 docker. Need to confirm and see if we need to fix in code or in doc.

@dai-chen dai-chen changed the title [BUG] Identifier _id is converted to id by ANTLR parser [BUG] Underscore in identifier is ignored by ANTLR parser Dec 2, 2022
@acarbonetto
Copy link
Collaborator

We would need to extend the parser like: Bit-Quill#142
Not sure if that's the direction we want to take this since this doesn't apply to other storage types

@dai-chen
Copy link
Collaborator Author

dai-chen commented Dec 5, 2022

We would need to extend the parser like: Bit-Quill#142 Not sure if that's the direction we want to take this since this doesn't apply to other storage types

yes, because of the backward compatibility, I think we need to support it. Otherwise users have to always back quote although it is more ANSI SQL standard. For other storages, I think it's fine and please see "hidden" columns in Presto Hive connector: https://prestodb.io/docs/current/connector/hive.html#extra-hidden-columns. Analyzer is supposed to throw exception if _id is used with S3 table because S3 storage doesn't return such column in getFieldTypes interface. The only drawback is it's semantic error rather than syntax error because we allow it in ANTLR grammar.

@acarbonetto
Copy link
Collaborator

Good to know. We'll push that PR forward since it solves a few bugs.
If, moving forward, we want a different solution for other storage engines, we can refactor later.

@dai-chen dai-chen added the legacy Issues related to legacy query engine to be deprecated label Dec 5, 2022
@dai-chen dai-chen added the v2.5.0 'Issues and PRs related to version v2.5.0' label Dec 12, 2022
@dai-chen dai-chen removed the v2.5.0 'Issues and PRs related to version v2.5.0' label Jan 6, 2023
@dai-chen dai-chen added the v2.6.0 label Feb 8, 2023
@dai-chen dai-chen added v2.7.0 and removed v2.6.0 labels Feb 16, 2023
@acarbonetto
Copy link
Collaborator

@dai-chen this can be closed as part of #1456
We now consider any field with _ so long as the field isn't reserved by OpenSearch.
Certain reserved fields are now supported (_id, _index, _score, _sort, _maxscore).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working legacy Issues related to legacy query engine to be deprecated SQL v2.7.0
Projects
None yet
Development

No branches or pull requests

4 participants