-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behavior with multiline Regex #87368
Comments
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsDescriptionLet's say I have a multiline string as follows:
I want to split this string by the GO line. For some reason the Regex I am using matches only GO\r and the rest ends up attached to the second part \nOno. So I get lone newline at the beginning of the string. This is unexpected, because according to documentation I found \r$ should be the right combination to match both. Of course I can workaround this by using \r\n. Still wanted to clarify, if I'm misinterpreting something. Note: There is at least one other variant of this regex, which behaves the same way: (?m)^GO\s$. Reproduction StepsRun the following program and observe that the hexadecimal representation of the Ono string. It has 0A (\n) attached at the beginning
Expected behaviorNo newline character at the beginning of split string Actual behaviorOutput of the attached program. Notice the additional printed newline.
Regression?No response Known WorkaroundsNo response Configuration.NET 7.0 Other informationNo response
|
Three points
Combining those three facts, n your case Then Split returns the pieces between those matches of So Split is working as specified. This fact that Regex is ignorant of
This is perhaps a little misleading, because when it says match, what it really means is "find a match at" rather than actually "match" because If you're now asking, what is the correct way to do you do your split while being tolerant of lines ending with |
When someone implements #25598 you'll be able to use Split in the way you were expecting it to work. |
As another option, just replace
Again, not sure whether you want the newlines in the matches or not so you'll need to adjust that. Yes, it should be easier -- this is tracked by #25598 Almost nobody would bother with the above -- I expect they'd just trim the results. |
Description
Let's say I have a multiline string as follows:
I want to split this string by the GO line. For some reason the Regex I am using matches only GO\r and the rest ends up attached to the second part \nOno. So I get lone newline at the beginning of the string. This is unexpected, because according to documentation I found \r$ should be the right combination to match both. Of course I can workaround this by using \r\n. Still wanted to clarify, if I'm misinterpreting something.
Note: There is at least one other variant of this regex, which behaves the same way: (?m)^GO\s$.
This variant fails completely to split string, which I find surprising too: (?m)^GO$
Reproduction Steps
Run the following program and observe that the hexadecimal representation of the Ono string. It has 0A (\n) attached at the beginning
Expected behavior
No newline character at the beginning of split string
Actual behavior
Output of the attached program. Notice the additional printed newline.
Regression?
No response
Known Workarounds
No response
Configuration
.NET 7.0
Windows 10 Professional
Other information
No response
The text was updated successfully, but these errors were encountered: