-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update regex-automata to 0.3 #4
Conversation
cc @hawkw |
ping @hawkw we want this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure how I feel about the use of universal_start_state
here, given that the regex-automata documentation states that
It is always correct for implementations to return None, and indeed, this is what the default implementation does. When this returns None, callers must use either start_state_forward or start_state_reverse to get the starting state.
It's a bit unfortunate that an automaton can no longer be constructed which is guaranteed to have a universal start state. I wonder if it's possible to use the Input
API with the first chunk of input to the matcher to generate the start state when we start writing input to the Matcher
for the first time, and then continue feeding individual bytes. I'm not sure if the regex-automata
Input
API works correctly if we do this, so I don't know if this is actually a good idea. We'd have to test that we can still match expressions correctly in that case.
Alternatively, we could look into whether it's possible to configure the DFA Builder
to reject any patterns which would lack a universal start state.
Honestly, I'm not sure whether the regex-automata
0.3 API allows us to do the things we want to do correctly or not. It seems like the expectation is that the DFA is always provided with the complete haystack in the form of an Input
. Whatever approach we end up pursuing, we probably need to add additional tests for more complex patterns to ensure our somewhat unorthodox use of the regex-automata
0.3 API works correctly.
src/lib.rs
Outdated
let automaton = dense::DFA::new(pattern)?; | ||
let start = automaton | ||
.universal_start_state(Anchored::No) | ||
.expect("I hope this works"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably return our own error type, which would be a #[non-exhaustive]
enum of either a dense::BuildError
or our own error variant indicating that the pattern is invalid because the automaton would have no universal start state. That way, we can remove the expect
here.
I played around with that idea a litte bit and made a rather ugly but kind of working POC you can take a look at. I changed the
Have not looked into that yet. I really wonder how the old version of
Yep The simplest way would of course be to reject every pattern that does not have a universal start state but that would maybe break existing usages of this library. So I think its worth exploring some options. |
Well, I think the |
Yeah I was talking more about breaking existing patterns but of course this is a breaking change no matter what. The main motivation of this PR was to get rid of the duplicate |
I had a quick look at |
Note that regex-automata 0.4 has now been released and is now used by the latest version of the regex crate. So if you want to eliminate duplicate dependencies that is the version you would want to move to. |
So I added the error enum and bumped the regex-automata version and it didn't break anything (have not really looked what actually changed). I also ran the |
Hi there! Is there any progress on this? I ask as a non-Rust expert Debian developer for whom a matchers compatible with regex-automata 0.4 would be a great help! (It turns out that maturin depends on matchers but Debian unstable/testing no longer supports regex-automata 0.1 😢 ) Many thanks! |
Any progress on this or the other PR updating regex-automata to 0.4? |
Well I kind of don't know how to proceed with this. It seems like this is the closest I could get to matching the old behavior with upgrading the version and indeed at least the tests for tracing subscriber were passing with this version last time I checked. Though I'm not 100% sure it will work for every use case because of the start state problem. @hawkw what do you think how could this PR proceed? |
#5 was merged |
This fixes #3 and once the dependency is updated in
tracing-subscriber
it will no longer requireregex-automata
0.1 and only 0.3 as this crate is used when theenv-filter
feature is enabled requiring both versions as of now.Regarding this PR it is not quite ready yet as I am not sure on how to handle the start state stuff. Obviously the expects can't stay. Sadly we can't construct a
dense::Error
ourselves and I am not really 100% convinced that storing the starting state in the Pattern as well is a good idea maybe we pass a bool to theMatcher
constructor?