Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parser failure for character alias #201

Merged
merged 6 commits into from
Jan 17, 2022
Merged

Fix parser failure for character alias #201

merged 6 commits into from
Jan 17, 2022

Conversation

mweisgut
Copy link
Contributor

@mweisgut mweisgut commented Jan 15, 2022

This PR fixes a parser failure that occurs when an SQL string contains a character alias.

Example query:

SELECT student.id AS character FROM student;

The fault is related to the data type CHARACTER VARYING, which is a synonym for VARCHAR.
Before this PR, the scanner (Flex) matched the strings CHARACTER and VARYING to support CHARACTER VARYING. However, having CHARACTER as a token, the character identifier in the query above is matched with the CHARACTER token. Thus, parsing the query fails since the corresponding parser rule expects the IDENTIFIER token rather than the CHARACTER token.

Most flex programs are quite ambiguous, with multiple patterns that can match the same input. Flex resolves the ambiguity with two simple rules:
• Match the longest possible string every time the scanner matches input.
• In the case of a tie, use the pattern that appears first in the program.

Levine, J. (2009). Flex & Bison: Text Processing Tools. "O'Reilly Media, Inc.". Page 22.

For the above query, both the CHARACTER token and the IDENTIFIER token would match the longest possible string, which is character. CHARACTER is chosen over IDENTIFIER since it appears first in the program.

This PR removes the CHARACTER and VARYING tokens and adds a CHARACTER_VARTING token. This token matches strings with the following pattern:
CHARACTER<whitespace>*VARYING

@mweisgut
Copy link
Contributor Author

mweisgut commented Jan 17, 2022

If we want to support CHARACTER(N) in the future (#202), working with states in flex could be a possible option (see https://stackoverflow.com/questions/1130597/start-states-in-lex-flex).

@@ -211,6 +209,8 @@ ROLLBACK TOKEN(ROLLBACK)
COMMIT TOKEN(COMMIT)
INTERVAL TOKEN(INTERVAL)

CHARACTER[ \t]+VARYING TOKEN(CHARACTER_VARYING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally fine with this version.
But normally, would we have to allow also newlines and comments between "CHARACTER" and "VARYING"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I added newlines at least.

@mweisgut mweisgut merged commit f497192 into master Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants