-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add regexp crate to Rust distribution (implements RFC 7) #13700
Conversation
Maybe a silly question, but wouldn't it make sense to put Unicode character classes support into the standard rust string library? |
Possibly. But I'm not sure. What would they be used for in Note that the matching algorithm depends on those Unicode classes to be available in sorted non-overlapping order, so that they are amenable to binary search. One possible path forward is to leave them in |
//! | ||
//! ## Matching one character | ||
//! | ||
//! <pre class="rust"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've generally tried to not use html tags in our documentation, this is done to not run the test/lexer over the contents? You may be able to get away with a notrust
tag after three backticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the reasoning is more insidious: I was unable to write a plain \
character in a fenced code block, so I resorted to the simpler solution of just writing the HTML. (I wasn't able to determine if this was a bug in the sundown parser or elsewhere...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh well, it was worth a try!
This looks even better than I thought it was going to be, amazing work, and thank you so much! |
Ah, one more small thing, we're trying to ensure that commits can be traced back to the RFC they implemented, so could you make sure that this shows up at the bottom of the first commit message (you can wait to rebase until later)
|
None => "", | ||
Some(ref h) => { | ||
match h.find(&name.to_owned()) { | ||
None => "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use h.find_equiv(name)
here in order to avoid allocating an owned string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I can. Fixed.
@alexcrichton Thanks! And thanks very much for all your comments so far. Very helpful. I will make sure to add Also, when I rebase, won't it change my commit history? I assume I'll have to force push. (Just want to make sure that's what's expected.) |
@BurntSushi: Yeah, you'll have to force push. |
// except according to those terms. | ||
|
||
// ignore-stage1 | ||
// ignore-cross-compile #12102 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most recently landed PR actually makes this so ignore-cross-compile
isn't necessary. The stack of commits will need to get rebase anyway, so just something to include in the rebasing.
Just a few small nits left, and otherwise this looks fantastic. After a rebasing, I think this is good to go! |
Argh, I didn't notice that when RFC 7 was accepted that it kept the name |
Whoa! Look at the ngrams with |
I just saw after a reload. Deleted my comment. |
C++ uses regex too. |
I don't really like I would not be opposed to naming the macro
(The .NET crowd is notably missing, but they call their module I don't know what it means to choose one name over another based on Google Trends telling me that there is a There seems to be a slight overall preference toward |
@BurntSushi There still remains the question of |
I prefer |
Rust convention is CamelCase for types. |
@seanmonstar Depends on whether you consider |
If we have |
Count me for that one, I think |
|
You meant the crate as |
@chris-morgan yes absolutely! Nice catch. Edited. |
OK, I've changed the name of the crate to |
}; | ||
|
||
/// For the `regex!` syntax extension. Do not use. | ||
#[macro_registrar] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just mark this as #[doc(hidden)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
Also adds a regex_macros crate, which provides natively compiled regular expressions with a syntax extension. Closes rust-lang#3591. RFC: 0007-regexps
Implements [RFC 7](https://github.com/rust-lang/rfcs/blob/master/active/0007-regexps.md) and will hopefully resolve #3591. The crate is marked as experimental. It includes a syntax extension for compiling regexps to native Rust code. Embeds and passes the `basic`, `nullsubexpr` and `repetition` tests from [Glenn Fowler's (slightly modified by Russ Cox for leftmost-first semantics) testregex test suite](http://www2.research.att.com/~astopen/testregex/testregex.html). I've also hand written a plethora of other tests that exercise Unicode support, the parser, public API, etc. Also includes a `regex-dna` benchmark for the shootout. I know the addition looks huge at first, but consider these things: 1. More than half the number of lines is dedicated to Unicode character classes. 2. Of the ~4,500 lines remaining, 1,225 of them are comments. 3. Another ~800 are tests. 4. That leaves 2500 lines for the meat. The parser is ~850 of them. The public API, compiler, dynamic VM and code generator (for `regexp!`) make up the rest.
Nice work @BurntSushi! |
1 similar comment
Nice work @BurntSushi! |
Implements RFC 7 and will hopefully resolve #3591. The crate is marked as experimental. It includes a syntax extension for compiling regexps to native Rust code.
Embeds and passes the
basic
,nullsubexpr
andrepetition
tests from Glenn Fowler's (slightly modified by Russ Cox for leftmost-first semantics) testregex test suite. I've also hand written a plethora of other tests that exercise Unicode support, the parser, public API, etc. Also includes aregex-dna
benchmark for the shootout.I know the addition looks huge at first, but consider these things:
regexp!
) make up the rest.