Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String.Split() trim last substring if specified #81331

Merged
merged 4 commits into from
Feb 16, 2023

Conversation

lordfanger
Copy link
Contributor

When String.Split() is called with null/empty array and whitespaces are used as separators, count is specified and TrimEntries are set the result is unexpected.

Actual behavior:
var x = "a b ".Split(Array.Empty(), 2, StringSplitOptions.TrimEntries); // x is ["a", "b "]
The last substring is not trimmed from end.

Changed behavior:
var x = "a b ".Split(Array.Empty(), 2, StringSplitOptions.TrimEntries); // x is ["a", "b"]
The last substring is trimmed correctly.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 29, 2023
@dnfadmin
Copy link

dnfadmin commented Jan 29, 2023

CLA assistant check
All CLA requirements met.

@danmoseley
Copy link
Member

Is this long standing behavior, e.g. .NET Framework even? Could code be broken by this change?

@am11
Copy link
Member

am11 commented Jan 30, 2023

Is this long standing behavior, e.g. .NET Framework even? Could code be broken by this change?

The behavior is present sinceTrimEntries field was added to StringSplitOptions in .NET 5.

@ghost
Copy link

ghost commented Jan 30, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

When String.Split() is called with null/empty array and whitespaces are used as separators, count is specified and TrimEntries are set the result is unexpected.

Actual behavior:
var x = "a b ".Split(Array.Empty(), 2, StringSplitOptions.TrimEntries); // x is ["a", "b "]
The last substring is not trimmed from end.

Changed behavior:
var x = "a b ".Split(Array.Empty(), 2, StringSplitOptions.TrimEntries); // x is ["a", "b"]
The last substring is trimmed correctly.

Author: lordfanger
Assignees: -
Labels:

area-System.Runtime, community-contribution

Milestone: -

@stephentoub stephentoub added bug and removed bug labels Jan 30, 2023
@lordfanger
Copy link
Contributor Author

Is this long standing behavior, e.g. .NET Framework even? Could code be broken by this change?

It could break code that would rely on that behavior but it'll be very rare. But it needs all 3 conditions to be true:

  • Split by whitespace (rare)
  • Limit by count (not so often)
  • TrimEntries (only 2 years old)

@dakersnar
Copy link
Contributor

dakersnar commented Feb 10, 2023

The current behavior does seem pretty unintuitive.

I was poking around at this functionality and discovered that if you limit the count to 1, it correctly trims the whitespace. Any idea why that case works but 2 (or higher) doesn't?

var x = "a b ".Split(Array.Empty(), 1, StringSplitOptions.TrimEntries); // x is ["a b"]

Copy link
Contributor

@dakersnar dakersnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if we are ok with the breaking change.

@danmoseley
Copy link
Member

too obscure to deserve breaking changes label / doc perhaps?

@lordfanger
Copy link
Contributor Author

I don't expect a lot broken codes (if any). The workaround is very simple. Trim the input string and it is ok (for performance friendly MemoryExtension overload it is super cheap anyway). No need to fill github issue perhaps. I was notified by failing tests for my IEnumerable version of string.Split() after upgrade to .NET 6 where compared results against core. Some cases were already fixed in #73194 but after upgrade to .NET 7 there were still some failing tests.

The doc already say that the substring will be trimmed if option flag is set and it is what this bugfix does.

I've looked at docs once more and only updates I suggests are for string.Split(char, int, StringSplitOptions) and string.Split(string, int, StringSplitOptions) where last statement in remarks says "If the string has already been split count - 1 times, but the end of the string has not been reached, then the last string in the returned array will contain this instance's remaining trailing substring, untouched.". But the last substring is touched if ends with whitespace and TrimEntries flag is set (these methods are untouched by this PR).

@lordfanger
Copy link
Contributor Author

I was poking around at this functionality and discovered that if you limit the count to 1, it correctly trims the whitespace. Any idea why that case works but 2 (or higher) doesn't?

The count 1 is special case and separator is ignored there, only postprocessing is performed if any (possible trimming is not mentioned in docs).

@stephentoub
Copy link
Member

The previous behavior was a bug; the method clearly wasn't doing what it was supposed to be doing (the caller asked for all of the segments to be trimmed and one wasn't in some corner cases). We don't need to proactively doc it.

@stephentoub stephentoub merged commit 00fe650 into dotnet:main Feb 16, 2023
@lordfanger lordfanger deleted the string-split branch February 16, 2023 15:10
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants