Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure keywords are single upper words #107

Merged
merged 2 commits into from
May 30, 2024

Conversation

mvorisek
Copy link
Contributor

@mvorisek mvorisek commented May 17, 2024

Into 1.4.x as this is needed for #106 bugfix.

In this PR, we make sure all "reserved" keywords are single word - this is how SQL specifications (MySQL/MariaDB/SQLite at least) define them.

The "reserved" keywords are special as they need to be always quoted*. The "reserved" keyword list will be updated in #106 to contain all and only the "reserved" keywords.

* this is not only helpful for maintaining the lists properly, but also needed for #73 as only "reserved" keywords can be upper cased safely (issue #95).

@mvorisek mvorisek force-pushed the fix_reserved_words branch from 8bd332e to 78c8eee Compare May 17, 2024 11:30
@mvorisek mvorisek marked this pull request as ready for review May 17, 2024 11:31
tests/TokenizerTest.php Outdated Show resolved Hide resolved
@mvorisek mvorisek requested a review from greg0ire May 17, 2024 19:36
@mvorisek mvorisek force-pushed the fix_reserved_words branch 2 times, most recently from 790bb74 to c03edef Compare May 17, 2024 21:09
@greg0ire
Copy link
Member

Changes are explained by added tests & commit messages.

Changes are described, but they are not explained by commit messages (maybe by tests, but that's not sufficient I'm afraid). For instance, what makes you think keywords appearing in reservedTopLevel must also appear in reserved?

@mvorisek
Copy link
Contributor Author

what makes you think keywords appearing in reservedTopLevel must also appear in reserved?

"reserved" words are special - they needs to be quoted even if used on places where only identifier is possible (ex select x as AND imply a parse error for both MySQL and SQLite). This PR is cleanup of current code before #106 is landed.

In #106, "reserved" words will be asserted by complete lists.

@greg0ire
Copy link
Member

I'm asking because reserved and derived properties seems mostly used inside this code block, where being part of reserved does not seem to do much if you are already part of reservedTopLevel or reservedNewline:

// A reserved word cannot be preceded by a '.'
// this makes it so in "mytable.from", "from" is not considered a reserved word
if (! $previous || $previous->value() !== '.') {
$upper = strtoupper($string);
// Top Level Reserved Word
if (
preg_match(
'/^(' . $this->regexReservedToplevel . ')($|\s|' . $this->regexBoundaries . ')/',
$upper,
$matches,
)
) {
return new Token(
Token::TOKEN_TYPE_RESERVED_TOPLEVEL,
substr($string, 0, strlen($matches[1])),
);
}
// Newline Reserved Word
if (
preg_match(
'/^(' . $this->regexReservedNewline . ')($|\s|' . $this->regexBoundaries . ')/',
$upper,
$matches,
)
) {
return new Token(
Token::TOKEN_TYPE_RESERVED_NEWLINE,
substr($string, 0, strlen($matches[1])),
);
}
// Other Reserved Word
if (
preg_match(
'/^(' . $this->regexReserved . ')($|\s|' . $this->regexBoundaries . ')/',
$upper,
$matches,
)
) {
return new Token(
Token::TOKEN_TYPE_RESERVED,
substr($string, 0, strlen($matches[1])),
);
}
}

Is there a part of the code where being part of reserved matters even if you are already part of the two other properties?

@mvorisek mvorisek force-pushed the fix_reserved_words branch 2 times, most recently from 2c9a539 to 3eff766 Compare May 18, 2024 08:54
@mvorisek
Copy link
Contributor Author

mvorisek commented May 18, 2024

I'm asking because reserved and derived properties seems mostly used inside this code block, where being part of reserved does not seem to do much if you are already part of reservedTopLevel or reservedNewline:

I have now removed the test I added testing if all new line/top level keywords are also reserved, as in theory, this does not have to be true.

Currently the lists are in quite bad shape, for ex. WITH keyword is already present in reservedToplevel but also in reserved.

Reserved keywords should be maintained however on its own and they will be asserted in #106, the reason is "reserved" keyworda are special as they must be always quoted - and they will be needed to address #95. "reserved top level" and "reserved new line" are just bad historical names.

@mvorisek mvorisek marked this pull request as draft May 21, 2024 09:50
@mvorisek mvorisek force-pushed the fix_reserved_words branch from 3eff766 to 5473108 Compare May 21, 2024 10:01
@mvorisek mvorisek marked this pull request as ready for review May 21, 2024 10:02
@mvorisek
Copy link
Contributor Author

@greg0ire I have now amended the commits with explanation messages.

@mvorisek
Copy link
Contributor Author

@greg0ire I would like to continue with #106, please, can you review this bugfix PR?

Comment on lines 32 to 59
public function testKeywordsReservedAreUpperCasedAndWithoutWhitespace(): void
{
$tokenizerReserved = $this->getTokenizerList('reserved');

$kwsDiff = array_filter($tokenizerReserved, static function ($v) {
return $v !== strtoupper($v) || preg_match('~\s~', $v) !== 0;
});

self::assertSame([], $kwsDiff);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this method supposed to test? This is checking the format of the reserved keywords.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All reserved keywords must be a single word in upper case (so we can rely on it when comparing againts other UC string).

@mvorisek mvorisek requested a review from SenseException May 21, 2024 23:16
Copy link
Member

@goetas goetas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I do not see the need for this change (except of we want to keep the array of keywords alphabetically sorted)

return $res;
}

public function testKeywordsReservedAreUpperCasedAndWithoutWhitespace(): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this test is testing something "too" internal. they way how properties are used in the class should not be IMO really tested. changing the implementation at this level of depth should not really impact tests IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explained in #107 (comment)

src/Tokenizer.php Outdated Show resolved Hide resolved
@mvorisek mvorisek requested a review from goetas May 22, 2024 11:40
@goetas
Copy link
Member

goetas commented May 22, 2024

@mvorisek just some feedback, The commit 2f84308 makes it very hard to review your changes...

i see a lot of small changes even in the other two commits about sorting variables in the arrays that honestly add little value and makes the reviewer work much harder

@goetas
Copy link
Member

goetas commented May 22, 2024

Would it be possible to isolate the "real" changes you did from the "esthetical" changes?

@mvorisek
Copy link
Contributor Author

@mvorisek mvorisek force-pushed the fix_reserved_words branch 2 times, most recently from df99b3a to a47492d Compare May 24, 2024 07:49
@mvorisek mvorisek marked this pull request as ready for review May 24, 2024 07:50
@mvorisek mvorisek requested review from greg0ire and goetas May 24, 2024 07:55
@mvorisek
Copy link
Contributor Author

@greg0ire the sorting PR (#117) was merged, can this PR be merged as well?

@greg0ire
Copy link
Member

I don't know if @goetas @derrabus and @SenseException are satisfied with your answers. Since you can't get approvals, I am closing this PR and I suggest you do one PR per commit, so that there is more hope to close open discussions, and so that we do not have to look at too many other PRs and issue to understand what you are trying to do.

In these new PRs, try to be less pushy, and to explains things better, with clearer sentences. Take your time. I'll give you an example:

to be able to use faster CS comparison, they are always CI

Here it is unclear if you are referring to things before or after your changes, and it is unclear what you are referring to with "they".

@greg0ire greg0ire closed this May 25, 2024
@mvorisek
Copy link
Contributor Author

mvorisek commented May 26, 2024

@greg0ire please reopen this PR and I will extract the 2nd commit into another PR. Thank you.

@greg0ire greg0ire reopened this May 26, 2024
@mvorisek mvorisek marked this pull request as draft May 30, 2024 07:17
@mvorisek mvorisek force-pushed the fix_reserved_words branch from a47492d to cbe343f Compare May 30, 2024 07:18
@mvorisek mvorisek changed the title Fix reserved keywords lists Make sure keywords are single upper words May 30, 2024
@mvorisek mvorisek marked this pull request as ready for review May 30, 2024 07:28
@mvorisek mvorisek force-pushed the fix_reserved_words branch from cbe343f to 229f681 Compare May 30, 2024 07:32
@mvorisek mvorisek marked this pull request as draft May 30, 2024 07:34
@mvorisek mvorisek force-pushed the fix_reserved_words branch from 229f681 to 58998dd Compare May 30, 2024 07:35
@mvorisek mvorisek marked this pull request as ready for review May 30, 2024 07:36
@mvorisek
Copy link
Contributor Author

PR is done and I did my best to explain the change in the PR description.

@@ -182,13 +182,10 @@ final class Tokenizer
'MYISAM',
'NAMES',
'NATURAL',
'NO OTHERS',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might not be understanding what you are doing here… shouldn't you be adding back NO as well as OTHERS to this list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO or OTHERS must not be added into "reserved" keywords as both are not "reserved" by SQL/MySQL spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure keywords are single upper words

Your commit message does not mention that at any point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description does. If the commit message should be amended, please tell me what text to put there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well maybe it should be a separate commit, since it's an entirely separate concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to end up exactly with the changes as this PR does - please tell me exactly what to do, am I right you want separate commit for 'NO OTHERS' keyword removal? Anything else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 commits:

  • one that removes NO OTHERS and explains why, and also takes a guess at why NO OTHERS was present in this list in the first place
  • another that is about making sure we deal with single words, and also explains why that is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think NO OTHERS was present in the list in the first place? You cite MySQL grammar, but this library is not solely about MySQL, is it?

Copy link
Contributor Author

@mvorisek mvorisek May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think NO OTHERS was present in the list in the first place?

See https://github.com/search?q=repo%3Amysql%2Fmysql-server+%22NO+OTHERS%22&type=code MySQL tests for usecases and https://github.com/antlr/grammars-v4/blob/4264f19a7d/sql/mysql/Oracle/MySQLParser.g4#L1208 for full grammar.

It is however not a "reserved" keyword because of two reasons. Keyword must be a single word. And must be listed as reserved in MySQL or MariaDB docs.

You cite MySQL grammar, but this library is not solely about MySQL, is it?

The jdorm library was for MySQL dialect only. In #106 many keywords will be added as the current list is far from complete.

mvorisek added 2 commits May 30, 2024 16:45
"NO" or "OTHERS" are not reserved keywords per MySQL grammar
The "reserved" keywords are special as they need to be always quoted and per MySQL/SQLite spec they are always single word and case insensitive.
@mvorisek mvorisek force-pushed the fix_reserved_words branch from 58998dd to ff667cc Compare May 30, 2024 14:48
@greg0ire greg0ire added this to the 1.4.1 milestone May 30, 2024
@greg0ire greg0ire merged commit 4605e91 into doctrine:1.4.x May 30, 2024
10 checks passed
@greg0ire
Copy link
Member

Thanks @mvorisek !

@mvorisek mvorisek deleted the fix_reserved_words branch May 30, 2024 16:24
@greg0ire greg0ire added the bug Something isn't working label Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants