Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unicode-case and unicode-perl features to the regex dependency #2566

Merged
merged 4 commits into from
Apr 21, 2023
Merged

Add unicode-case and unicode-perl features to the regex dependency #2566

merged 4 commits into from
Apr 21, 2023

Conversation

dbidwell94
Copy link
Contributor

@dbidwell94 dbidwell94 commented Apr 21, 2023

Regex parse error

Motivation

tracing-subscriber-0.3.16\src\filter\env\directive.rs:140:10 fails to parse regex in certain cases

Solution

Refactor referenced regex so that unicode-case and unicode-perl is not required.

@BurntSushi
Copy link

BurntSushi commented Apr 21, 2023

I don't believe this is quite sufficient. At least, it isn't for me when I run cargo test locally. You also need unicode-perl since y'all are using \w and \s.

With that said, an alternative here is to not add any new features and tweak the regexes so that they don't require huge Unicode tables. There are a couple ways of doing that, but here's one:

diff --git a/tracing-subscriber/src/filter/env/directive.rs b/tracing-subscriber/src/filter/env/directive.rs
index f0ccbd7e..1d332d32 100644
--- a/tracing-subscriber/src/filter/env/directive.rs
+++ b/tracing-subscriber/src/filter/env/directive.rs
@@ -122,15 +122,15 @@ impl Directive {
     pub(super) fn parse(from: &str, regex: bool) -> Result<Self, ParseError> {
         static DIRECTIVE_RE: Lazy<Regex> = Lazy::new(|| Regex::new(
             r"(?x)
-            ^(?P<global_level>(?i:trace|debug|info|warn|error|off|[0-5]))$ |
+            ^(?P<global_level>(?i-u:trace|debug|info|warn|error|off|[0-5]))$ |
                 #                 ^^^.
                 #                     `note: we match log level names case-insensitively
             ^
             (?: # target name or span name
-                (?P<target>[\w:-]+)|(?P<span>\[[^\]]*\])
+                (?P<target>[[:word:]:-]+)|(?P<span>\[[^\]]*\])
             ){1,2}
             (?: # level or nothing
-                =(?P<level>(?i:trace|debug|info|warn|error|off|[0-5]))?
+                =(?P<level>(?i-u:trace|debug|info|warn|error|off|[0-5]))?
                     #          ^^^.
                     #              `note: we match log level names case-insensitively
             )?
@@ -151,7 +151,7 @@ impl Directive {
                     (?:=[^,]+)?
                 )
                 # trailing comma or EOS
-                (?:,\s?|$)
+                (?:,[[:space:]]?|$)
             "#).unwrap());
 
         let caps = DIRECTIVE_RE.captures(from).ok_or_else(ParseError::new)?;

This patch results in all tests passing locally for me without adding any new regex crate features.

Do note though that this does subtly change the semantics of what the regex matches. My intuition says this is probably fine here, but that's up to y'all to decide. :-)

jjant added a commit to smithy-lang/smithy-rs that referenced this pull request Apr 21, 2023
Workaround until tokio-rs/tracing#2566
fixes this.
#2566 (comment)

 ## Motivation

Missing features for the `regex` crate was causing build time errors
due to the the use of unicode characters in the regex without using
the proper features within the regex crate

 ## Solution

Refactor regex so that no external feature are required, but regex
is still valid and tests still pass

Authored-by: Devin Bidwell <[email protected]>
@dbidwell94
Copy link
Contributor Author

The force push above is the 3 commits squashed into one

@dbidwell94 dbidwell94 changed the title Add unicode-case feature by default to regex dependency to fix regex … refactor regex so as to not need unicode-case and unicode-perl regex features Apr 21, 2023
@davidbarsky
Copy link
Member

davidbarsky commented Apr 21, 2023

The force push above is the 3 commits squashed into one

you don't need to force-push; we squash on merge.

(I'm running CI and looking at this PR now.)

@dbidwell94
Copy link
Contributor Author

Yeah those tests were failing on my end as well. Not sure why, untouched by me.

@davidbarsky
Copy link
Member

Do note though that this does subtly change the semantics of what the regex matches. My intuition says this is probably fine here, but that's up to y'all to decide. :-)

@BurntSushi Out of curiosity, what is the subtle change? It's disabling unicode case folding?

@davidbarsky
Copy link
Member

Yeah those tests were failing on my end as well. Not sure why, untouched by me.

I'm not entirely sure why the UI tests started failing all. Digging into it.

@BurntSushi
Copy link

@davidbarsky The change is that your regex matches fewer strings than it did before. (?i:trace|debug|info|warn|error|off) and (?i-u:trace|debug|info|warn|error|off) actually happen to match the same things. But it depends. For example, (?i:s) is equivalent to [Ssſ], but (?i-u:s) is equivalent to [Ss]. Notice the former includes ſ (U+017F).

So you're fine there. But \s is Unicode aware and [[:space:]] is not. Similarly for \w. So \w will match β but [[:word:]] (or equivalently, (?-u:\w)) will not match β. \s also contains some non-ASCII Unicode codepoints for whitespace such as U+2000.

So if you specifically wanted \w to match a whole bunch of stuff other than [0-9A-Za-z_], then yeah, you should keep it and enable the appropriate Unicode features to bring in the requisite Unicode tables.

The sad thing is that you'll still have a dependency on matchers, which in turn depends on an older version of regex-syntax. So you're already bringing in a whole bunch of Unicode tables that you probably aren't using.

@davidbarsky
Copy link
Member

@davidbarsky The change is that your regex matches fewer strings than it did before. (?i:trace|debug|info|warn|error|off) and (?i-u:trace|debug|info|warn|error|off) actually happen to match the same things. But it depends. For example, (?i:s) is equivalent to [Ssſ], but (?i-u:s) is equivalent to [Ss]. Notice the former includes ſ (U+017F).

So you're fine there. But \s is Unicode aware and [[:space:]] is not. Similarly for \w. So \w will match β but [[:word:]] (or equivalently, (?-u:\w)) will not match β. \s also contains some non-ASCII Unicode codepoints for whitespace such as U+2000.

So if you specifically wanted \w to match a whole bunch of stuff other than [0-9A-Za-z_], then yeah, you should keep it and enable the appropriate Unicode features to bring in the requisite Unicode tables.

The sad thing is that you'll still have a dependency on matchers, which in turn depends on an older version of regex-syntax. So you're already bringing in a whole bunch of Unicode tables that you probably aren't using.

Gotcha, thanks for explaining this! I'm not especially good at understanding regexes 😅

@BurntSushi
Copy link

Aye. If it's any consolation this is less about regexes and more about the intersection of regexes and Unicode. It's a difficult beast to tame!

@dbidwell94
Copy link
Contributor Author

So it sounds like it might actually be best to enable the features instead of refactor the original regexes?

@BurntSushi
Copy link

(@dbidwell94 That question I can't answer. I just wanted to put the tweaked regex out there as an alternative. It's up to y'all which behavior you want.)

@mati865
Copy link

mati865 commented Apr 21, 2023

From a user perspective I doubt Unicode even makes sense here. In Rust you cannot name functions with non-ASCII characters so there should be no real-world use case where one wants to enable specific log level for some weird path.

@davidbarsky
Copy link
Member

We'll end up enabling the feature for now and refactoring the regex later.

@hawkw
Copy link
Member

hawkw commented Apr 21, 2023

From a user perspective I doubt Unicode even makes sense here. In Rust you cannot name functions with non-ASCII characters so there should be no real-world use case where one wants to enable specific log level for some weird path.

Also, the case-insensitivity is being used for the part of the regex which parses the name of the log level, rather than the module path. The log levels themselves are the strings "trace", "debug", "info", "warn", "error", and "off", which should always be ASCII...

@hawkw
Copy link
Member

hawkw commented Apr 21, 2023

Let's get a release out with this change, and then revisit refactoring the regex later!

@hawkw hawkw enabled auto-merge (squash) April 21, 2023 18:58
@dbidwell94
Copy link
Contributor Author

@hawkw Not quite sure why those tests are failing. @davidbarsky said he was going to look into it.

@hawkw
Copy link
Member

hawkw commented Apr 21, 2023

@hawkw Not quite sure why those tests are failing. @davidbarsky said he was going to look into it.

These are UI tests for proc-macro error output. My guess is that they are failing because the compiler's error output changed somewhat on the latest stable version. @davidbarsky, we can probably fix these by running the UI tests with TRYBUILD=overwrite, to update the expected output to match the current stable compiler's output, and then committing the results.

@dbidwell94
Copy link
Contributor Author

@hawkw Worked for me locally. Committing and pushing now.

auto-merge was automatically disabled April 21, 2023 19:31

Head branch was pushed to by a user without write access

@dbidwell94 dbidwell94 changed the title refactor regex so as to not need unicode-case and unicode-perl regex features Add unicode-case and unicode-perl features to the regex dependency Apr 21, 2023
@dbidwell94
Copy link
Contributor Author

PR which fixes the UI tests (will need to be force merged as the build will fail due to missing regex features)
#2568

@hawkw hawkw enabled auto-merge (squash) April 21, 2023 20:58
@hawkw hawkw merged commit 998774e into tokio-rs:master Apr 21, 2023
hawkw pushed a commit that referenced this pull request Apr 21, 2023
… dependency (#2566)

 ## Motivation

Missing features for the `regex` crate were causing build time errors
due to the the use of unicode characters in the regex without using
the proper features within the regex crate

 ## Solution

Add the missing feature flags.

Fixes #2565

Authored-by: Devin Bidwell <[email protected]>
hawkw pushed a commit that referenced this pull request Apr 21, 2023
… dependency (#2566)

 ## Motivation

Missing features for the `regex` crate were causing build time errors
due to the the use of unicode characters in the regex without using
the proper features within the regex crate

 ## Solution

Add the missing feature flags.

Fixes #2565

Authored-by: Devin Bidwell <[email protected]>
hawkw pushed a commit that referenced this pull request Apr 21, 2023
… dependency (#2566)

 ## Motivation

Missing features for the `regex` crate were causing build time errors
due to the the use of unicode characters in the regex without using
the proper features within the regex crate

 ## Solution

Add the missing feature flags.

Fixes #2565

Authored-by: Devin Bidwell <[email protected]>
hawkw added a commit that referenced this pull request Apr 22, 2023
# 0.3.17 (April 21, 2023)

This release of `tracing-subscriber` fixes a build error when using
`env-filter` with recent versions of the `regex` crate. It also
introduces several minor API improvements.

### Fixed

- **env-filter**: Add "unicode-case" and "unicode-perl" to the `regex`
  dependency, fixing a build error with recent versions of `regex`
  (#2566)
- A number of minor documentation typos and other fixes (#2384,
  #2378, #2368, #2548)

### Added

- **filter**: Add `fmt::Display` impl for `filter::Targets` ([2343)
- **fmt**: Made `with_ansi(false)` no longer require the "ansi" feature,
  so that ANSI formatting escapes can be disabled without requiring
  ANSI-specific dependencies (#2535)

### Changed

- **fmt**: Dim targets in the `Compact` formatter, matching the default
  formatter (#2409)

Thanks to @keepsimple1, @andrewhalle, @LeoniePhiline, @LukeMathWalker,
@howardjohn, @daxpedda, and @dbidwell94 for contributing to this
release!
hawkw added a commit that referenced this pull request Apr 22, 2023
# 0.3.17 (April 21, 2023)

This release of `tracing-subscriber` fixes a build error when using
`env-filter` with recent versions of the `regex` crate. It also
introduces several minor API improvements.

### Fixed

- **env-filter**: Add "unicode-case" and "unicode-perl" to the `regex`
  dependency, fixing a build error with recent versions of `regex`
  (#2566)
- A number of minor documentation typos and other fixes (#2384, #2378,
  #2368, #2548)

### Added

- **filter**: Add `fmt::Display` impl for `filter::Targets` (#2343)
- **fmt**: Made `with_ansi(false)` no longer require the "ansi" feature,
  so that ANSI formatting escapes can be disabled without requiring
  ANSI-specific dependencies (#2532)

### Changed

- **fmt**: Dim targets in the `Compact` formatter, matching the default
  formatter (#2409)

Thanks to @keepsimple1, @andrewhalle, @LeoniePhiline, @LukeMathWalker,
@howardjohn, @daxpedda, and @dbidwell94 for contributing to this
release!
s1gtrap added a commit to s1gtrap/trunk that referenced this pull request Apr 22, 2023
s1gtrap added a commit to s1gtrap/trunk that referenced this pull request Apr 25, 2023
tgeoghegan added a commit to divviup/divviup-api that referenced this pull request May 2, 2023
`sea-orm-migration` 0.11.2 pulls in a version of `tracing-subscriber`
with a bug that stops migrations from working. See [1], [2], [3] for
context.

[1]: https://github.com/SeaQL/sea-orm/releases/tag/0.11.3
[2]: SeaQL/sea-orm#1608
[3]: tokio-rs/tracing#2566
tgeoghegan added a commit to divviup/divviup-api that referenced this pull request May 2, 2023
`sea-orm-migration` 0.11.2 pulls in a version of `tracing-subscriber`
with a bug that stops migrations from working. See [1], [2], [3] for
context.

[1]: https://github.com/SeaQL/sea-orm/releases/tag/0.11.3
[2]: SeaQL/sea-orm#1608
[3]: tokio-rs/tracing#2566
s1gtrap added a commit to s1gtrap/trunk that referenced this pull request Jun 4, 2023
hongquan added a commit to hongquan/tracing that referenced this pull request Oct 10, 2023
This release of `tracing-subscriber` fixes a build error when using
`env-filter` with recent versions of the `regex` crate. It also
introduces several minor API improvements.

- **env-filter**: Add "unicode-case" and "unicode-perl" to the `regex`
  dependency, fixing a build error with recent versions of `regex`
  (tokio-rs#2566)
- A number of minor documentation typos and other fixes (tokio-rs#2384, tokio-rs#2378,
  tokio-rs#2368, tokio-rs#2548)

- **filter**: Add `fmt::Display` impl for `filter::Targets` (tokio-rs#2343)
- **fmt**: Made `with_ansi(false)` no longer require the "ansi" feature,
  so that ANSI formatting escapes can be disabled without requiring
  ANSI-specific dependencies (tokio-rs#2532)

- **fmt**: Dim targets in the `Compact` formatter, matching the default
  formatter (tokio-rs#2409)

Thanks to @keepsimple1, @andrewhalle, @LeoniePhiline, @LukeMathWalker,
@howardjohn, @daxpedda, and @dbidwell94 for contributing to this
release!
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
# 0.3.17 (April 21, 2023)

This release of `tracing-subscriber` fixes a build error when using
`env-filter` with recent versions of the `regex` crate. It also
introduces several minor API improvements.

### Fixed

- **env-filter**: Add "unicode-case" and "unicode-perl" to the `regex`
  dependency, fixing a build error with recent versions of `regex`
  (tokio-rs#2566)
- A number of minor documentation typos and other fixes (tokio-rs#2384, tokio-rs#2378,
  tokio-rs#2368, tokio-rs#2548)

### Added

- **filter**: Add `fmt::Display` impl for `filter::Targets` (tokio-rs#2343)
- **fmt**: Made `with_ansi(false)` no longer require the "ansi" feature,
  so that ANSI formatting escapes can be disabled without requiring
  ANSI-specific dependencies (tokio-rs#2532)

### Changed

- **fmt**: Dim targets in the `Compact` formatter, matching the default
  formatter (tokio-rs#2409)

Thanks to @keepsimple1, @andrewhalle, @LeoniePhiline, @LukeMathWalker,
@howardjohn, @daxpedda, and @dbidwell94 for contributing to this
release!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants