Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex::is_match unexpectedly returning false with regex-syntax v0.8.0 #1103

Closed
edmorley opened this issue Oct 12, 2023 · 2 comments
Closed

Comments

@edmorley
Copy link

edmorley commented Oct 12, 2023

Hi πŸ˜„

First, thank you for a fantastic crate!

What version of regex are you using?

$ cargo tree
testcasev0.1.0 (/Users/emorley/src/testcase)
└── regex v1.10.0
    β”œβ”€β”€ aho-corasick v1.1.2
    β”‚   └── memchr v2.6.4
    β”œβ”€β”€ memchr v2.6.4
    β”œβ”€β”€ regex-automata v0.4.1
    β”‚   β”œβ”€β”€ aho-corasick v1.1.2 (*)
    β”‚   β”œβ”€β”€ memchr v2.6.4
    β”‚   └── regex-syntax v0.8.0
    └── regex-syntax v0.8.0

Describe the bug at a high level.

Using regex-syntax v0.8.0, a Regex::is_match that previously matched in regex-syntax v0.7.4 no longer matches.

Whilst the issue no longer reproduces as part of the regex-syntax v0.8.1 release (I would guess due to the revert in that release: #1102), I thought you might still wish to see the testcase in case it covers something that the newly added fuzz tests do not, ready for the future relanding of #1051.

What are the steps to reproduce the behavior?

  1. cargo new testcase && cd $_
  2. cargo add [email protected]
  3. cargo update -p regex-syntax --precise 0.8.0
  4. Add this to the end of main.rs:
    #[cfg(test)]
    mod tests {
        #[test]
        fn testcase() {
            assert!(regex::Regex::new(r"^[[:alnum:]./-]+$")
                .unwrap()
                .is_match("a-b"));
        }
    }
  5. cargo test
  6. cargo update -p regex-syntax --precise 0.8.1
  7. cargo test

What is the actual behavior?

  • cargo test at step 5 (ie: when using regex-syntax v0.8.0) fails with the output below
  • cargo test at step 7 (ie: when using regex-syntax v0.8.1) passes
---- tests::testcase stdout ----
thread 'tests::testcase' panicked at src/main.rs:9:9:
assertion failed: regex::Regex::new(r\"^[[:alnum:]./-]+$\").unwrap().is_match(\"a-b\")

What is the expected behavior?

cargo test passes regardless of regex-syntax version.

Notes

  • I'm using Rust 1.74.0-beta.1 on macOS on ARM
  • If I replace the [:alnum:] in the regex with eg a-z then the issue stops reproducing.
@BurntSushi
Copy link
Member

Thanks! I'll add this as a test case. There are probably a lot of them unfortunately. I clearly had too much confidence in the test suite. The issue is in the HIR:

$ regex-cli-interval-opts debug hir '^[[:alnum:]./-]+$'
    parse time:  16.109Β΅s
translate time:  8.883Β΅s

Concat(
    [
        Look(
            Start,
        ),
        Repetition(
            Repetition {
                min: 1,
                max: None,
                greedy: true,
                sub: Class(
                    {
                        '.'..='.',
                        '-'..='9',
                        'A'..='Z',
                        'a'..='z',
                    },
                ),
            },
        ),
        Look(
            End,
        ),
    ],
)

$ regex-cli debug hir '^[[:alnum:]./-]+$'
    parse time:  14.264Β΅s
translate time:  7.067Β΅s

Concat(
    [
        Look(
            Start,
        ),
        Repetition(
            Repetition {
                min: 1,
                max: None,
                greedy: true,
                sub: Class(
                    {
                        '-'..='9',
                        'A'..='Z',
                        'a'..='z',
                    },
                ),
            },
        ),
        Look(
            End,
        ),
    ],
)

The first command is with regex-syntax 0.8.0 while the latter is 0.8.1. You can see that the ranges are out of order. For example, . should just be part of the '-'..='9' range.

@edmorley
Copy link
Author

Thank you for taking a look and also explaining the issue :-)

Licheam added a commit to Licheam/regex that referenced this issue Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants