Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trim/0, ltrim/0 and rtrim/0 #212

Merged
merged 1 commit into from
Sep 23, 2024
Merged

Add trim/0, ltrim/0 and rtrim/0 #212

merged 1 commit into from
Sep 23, 2024

Conversation

wader
Copy link
Contributor

@wader wader commented Sep 23, 2024

Trims leading and trailing whitespace.

Was added to jq in jqlang/jq#3056

Trims leading and trailing whitespace.

Was added to jq in jqlang/jq#3056
@wader
Copy link
Contributor Author

wader commented Sep 23, 2024

In the jq version of this there is an optimization to not create a new string if there is nothing to trim https://github.com/jqlang/jq/pull/3056/files#diff-bd792def7de6d4b8d70ddc844eabe52f73c264904e263580215219b669a2ca9cR1239-R1242 i tried to follow the rust code but not sure i understood fully how it would behave

@01mf02 01mf02 merged commit 4bcb90b into 01mf02:main Sep 23, 2024
1 check passed
@wader wader deleted the trim branch September 23, 2024 13:52
@01mf02
Copy link
Owner

01mf02 commented Sep 23, 2024

Thanks a lot for your contribution, @wader! It is absolutely exemplary --- updating docs, adding tests, a small implementation. And even linking to your original commit in jq. Great work.

I implemented on top of your work the optimisation that you mentioned, see ec3256a. It relies on the fact that trimming an already trimmed &str returns an identical &str, and we can compare these two in constant time by pointer equality (core::ptr::eq, which considers both the start of the string and the length of the string).

@wader
Copy link
Contributor Author

wader commented Sep 23, 2024

Thanks 🙏 and nice you added the optimisation, i haven't researched it but i suspect that most calls to trim will not trim anything. Maybe there are more string functions that could benefit of the same optimisation?

@01mf02
Copy link
Owner

01mf02 commented Sep 23, 2024

I just checked that, and ltrimstr as well as rtrimstr already perform this optimisation.
escape_csv and escape_sh are unfortunately no candidates for this optimisation due to the way that str::replace works. The same goes for ascii_downcase and ascii_upcase.

@wader
Copy link
Contributor Author

wader commented Sep 23, 2024

👍 now also remembered that i forgot to add tests for some different unicode whitespace codepoints, there is a bunch of different ones, ex a test like this https://github.com/jqlang/jq/blob/master/tests/jq.test#L1363-L1367. I did a quick check and it seems like jq, gojq and jaq all agree on what is whitespace now, ex javascript does not see U+0085 NEXT LINE as whitespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants