-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix incorrect parse failure for a valid pre-release semver #70
Conversation
Oh wow, yeah, that dates from SemVer 1.0.0-beta. Thanks for the patch! |
Hrm. This actually breaks
That tests for the statement in semver 2.0.0 paragraph 9 that "Numeric identifiers MUST NOT include leading zeroes." Note that |
Thank you for the great product. We've been using this in github.com/CrowdStrike/perseus for a bit over 2 years now and this is the first issue we've run into. It only cropped up because someone tagged a Go module with that I apologize for not running the tests before submitting this PR but I'm a Go developer these days and don't have a machine set up to do C builds. |
I'm also mainly a Go developer; I only pretend to be a C programmer when I have no other choice :-) |
Turns out `patch` wasn't populated yet, so my last commit was scanning nothing. Finally found the bug: it need to start scanning at `atchar`, not `len - atchar`. Wild that this never came up before! Only found it just now by, at the last second, testing the example from theory#70: `0.0.0-00010101000000-000000000000`. Now included in the test corpus.
A pre-release "Numeric identifiers MUST NOT include leading zeroes," according to SemVer 2.0. The existing code had accounted for this limitation: when it found a pre-release starting with a 0 followed by a digit, it would scan ahead and allow the value if it found an alphabetic character. There were two bugs in the Implementation. First, in addition to alphabetic characters, SemVer allows dashes. The [project regex] matches `[a-zA-Z-]`. So add a check for a dash in addition to `isalpha()`. Thanks to Dylan Bourque for the pull request (theory#70). The other bug was that the scanning was was not starting from the character after the zero and following digit. It was, instead, starting from that position subtracted from the length. For example, if the pre-release started at character 7 and the semver was 12 characters long, it would start scanning at character 13, well *after* the start of the pre-release. If the pre-release started at character 8 and the semver was 12 characters long, it would start scanning from character 4, well *before* the start of the pre-release. It's a wonder no errors were previously discovered. Fix it by starting from the current character, and include some details in Changes to help find existing bad values.
Bah, what an annoying bug! Will release v0.40.0 shortly. Many thanks for the report and fix! |
I appreciate you closing the loop on this. 🍻 |
Glad to do it. Just be aware that you might have some existing prerelease versions that would be invalid after an upgrade. |
What's the pattern for the existing versions that are now invalid? We have ~65K rows in our database with pre-release versions, all of which came from Go modules. I feel like we'll be fine, but it's easy enough to check. Follow-up: I have 0 records in my database with a non-empty pre-release version that doesn't match the official regex. |
It's difficult to say, because it depends on the length of the semver and the pre-release. You're probably safe, but you can use a regex to check. From Changes, it will be something like: SELECT name, version FROM packages
WHERE version::text !~ '^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$'; If that returns no rows you should be fine. |
That's pretty much what I did, except I only checked the pre-release section SELECT count(*) FROM module_versions
WHERE
get_semver_prerelease(version) != '' AND
get_semver_prerelease(version) ~ '^(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(\.(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*$'; |
Oh yeah, I forgot about FWIW, this applies only to prereleases that start with a 0 and contains only digits. So something like this would probably work: SELECT count(*) FROM module_versions
WHERE get_semver_prerelease(version) ~ '^0\[0-9]+($|\+)'; |
This PR addresses an incorrect parse failure when a pre-release token has a leading zero and consists of only digits and a hyphen ('-'), like
0.0.0-00010101000000-000000000000
.Using v0.31.2 of the extension,
semver('0.0.0-00010101000000-000000000000')
reports the following error:The portion after the first hyphen constitutes a valid pre-release token (ASCII alphanumerics and hyphens) but is not a number so the Numeric identifiers MUST NOT include leading zeroes clause in the specification does not apply.
The submitted fix is to extend the
isalpha()
check in the look-ahead portion ofparse_semver()
to also check if the current character is a hyphen.