Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Replacer::by_ref adaptor to use a Replacer without consuming it #449

Closed
wants to merge 1 commit into from

Conversation

mbrubeck
Copy link
Contributor

@mbrubeck mbrubeck commented Feb 14, 2018

This is useful when you want to take a generic Replacer (which might not be cloneable) and use it without consuming it, so it can be used more than once.

Note: This can't simply return &mut Self because a generic impl<R: Replacer> Replacer for &mut R would conflict with libstd's generic impl<F: FnMut> FnMut for &mut F.

@mbrubeck
Copy link
Contributor Author

#83 proposed something similar but without the same motivation.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbrubeck! I think the idea here seems OK to me, but do you have a specific example in mind where having this would help?

As far as the PR itself, it looks like this doesn't work on Rust 1.12. Is there a way to make this work for 1.12? If not, we'll probably have to leave this until I do the next version bump.

Also, could you wrap all the doc strings to 79 (inclusive) columns? Thanks.

@albel727
Copy link

Here's a specific problem that this solves.

https://play.rust-lang.org/?gist=3a4d8af0b9ee6b8faa984a6966eb2728&version=stable

@mbrubeck
Copy link
Contributor Author

I wrote this based on @albel727's code which was given in IRC. @albel727, is that based on a real-world use case, or is it an exercise/toy program?

Another use case might be applying the same replacement to many different input strings, though I don't have a more concrete example of this in mind.

@albel727
Copy link

It was a real use case, that I simplified into the example on the playground.

ReplacerRef allows to reuse the same Replacer in generic context, which in this case fulfills the need to repeatedly apply the same Regex and Replacer to a string until it stops applying, because the replacement may result in appearance of something that also needs to be replaced.

I think implementing such a fixpoint operator for regex replacement is a legitimate need. Consider this example, which is closer to what I needed.

Suppose you have something like "f(1)f(2)f(3)f(4)g(5)f(6)f(7)", and you need to merge adjacent f() contents, i.e. "f(1)f(2)f(3)" -> "f(123)", and "f(1)f(2)f(3)f(4)g(5)f(6)f(7)" -> "f(1234)g(5)f(67)".

It can't be written as a regex with single replacement, but can be written as a repeated replacement on pairs of adjacent f(), i.e.

loop { // while Regex applies
    s = Regex::new(r"f\(([^)]*)\)f\(([^)]*)\)").unwrap().replace(s, "f($1$2)");
}

@mbrubeck
Copy link
Contributor Author

As far as the PR itself, it looks like this doesn't work on Rust 1.12.

Fixed.

Also, could you wrap all the doc strings to 79 (inclusive) columns? Thanks.

Done.

@BurntSushi
Copy link
Member

All right, I think I'm OK with this. The use case seems a little out there, but since there's no other way to achieve it and this is a very small API addition, then I think I'm OK with this.

Thanks @mbrubeck!

@albel727
Copy link

Well, another way to define the use-case is emulating "overlapping replacement", with the repeated application of a non-overlapping replacement, which replace()/replace_all() are. I'm convinced it's actually a pretty common thing to want.

@albel727
Copy link

I'm not getting there, am I? 😄

Well, suppose you want to remove space between every two digits, e.g
"abc 1 2 3 de 4 5 fghi 6 7 8 9" ->
"abc 123 de 45 fghi 6789".

You can only do that with repeated application of replace_all(), replacing (\d)\s(\d) with $1$2. Single application would leave you with
"abc 12 3 de 45 fghi 67 89",
because pairs of digits overlap.

This is actually a simpler problem than what repeated application allows to solve in general, because it could be theoretically solved with a single replace_all() on a regex with lookahead. But many replacement problems can't be solved with lookahead, and Regex implementation doesn't have lookahead, afaik.

The want for repeated application often arises when one wants to cheat and transform some simple subset of context-free grammar language with regular expressions. E.g. merge contents of consecutive tags in some simple html, or merge consecutive extern "C" { fn f1() } extern "C" { fn f2() } extern "C" { fn f3() } in bindgen-generated Rust sources.

Alternative would be firing whole HTML/Rust parsers and transforming resulting ASTs, which is often an overkill for the task.

@BurntSushi
Copy link
Member

BurntSushi commented Feb 14, 2018

@albel727 No, I get it. :) I'm just confused at how you could possibly insist that something like this is common! I've been programming for a long time, and in all that time, neither myself nor any of the code I've read have ever needed something like this out of a regex library. Just because something is niche or uncommon doesn't mean I think we automatically shouldn't have it, but it definitely weighs in the cost-benefit analysis. The fact that you've presented some interesting examples, and have demonstrated that you can't really do what you want without this addition is sufficient enough for me (combined with the fact that this is a very simple and lightweight addition to the existing API).

But, it's impossible to truly know. After this makes it into the next release, let's come back here in two years and look at all of regex's dependents. Count me shocked if we find more than a single digit number of uses. :-)

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbrubeck Thanks so much for quickly updating the PR! I've added another round of nits. Thanks for your patience on this. :)

/// not be cloneable) and use it without consuming it, so it can be used
/// more than once:
///
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to nit at these things, but could you add an # Example header here to be consistent with how other examples are done? (I believe this is also the style in std too.)

/// let dst = re.replace_all(&dst, rep.by_ref());
/// dst.into_owned()
/// }
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you for writing out this example! Could you also add it to the re_bytes version too?

/// By-reference adaptor for a `Replacer`
///
/// Returned by [`Replacer::by_ref`](trait.Replacer.html#method.by_ref).
pub struct ReplacerRef<'a, R: ?Sized + 'a>(&'a mut R);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add #[derive(Debug)] to this? (And do the same for the re_bytes module.)

Note: This can't simply return `&mut Self` because a generic
`impl<R: Replacer> Replacer for &mut R` would conflict with libstd's
generic `impl<F: FnMut> FnMut for &mut F`.
@mbrubeck
Copy link
Contributor Author

Nits fixed.

@albel727
Copy link

Alright, fairy nuff. :) Though I've needed this quite often in my years of programming, but maybe it's just me.

So now I can feel at rest, knowing that one will always be able to implement one-dimensional cellular automata with Regex, whenever they wanted. 🤣 Out of curiosity, when is this expected to land?

@BurntSushi BurntSushi closed this in 7f020b8 Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants