Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "+=" operator for strings #1500

Closed
Mohan-Dhawan opened this issue Aug 18, 2023 · 3 comments · Fixed by #1633
Closed

Add "+=" operator for strings #1500

Mohan-Dhawan opened this issue Aug 18, 2023 · 3 comments · Fixed by #1633
Assignees
Labels
Feature Request Request for new functionality

Comments

@Mohan-Dhawan
Copy link

Strings do not have the += operator and concatenation is expensive when doing a = a + b;. An alternative is to convert strings to bytes and then concatenate, but it would be nice to have += supported for strings natively.

@bbannier bbannier added the Feature Request Request for new functionality label Aug 18, 2023
@vpax
Copy link

vpax commented Aug 18, 2023

[EDIT: here I was actually referring to Zeek scripting, I missed that this issue is for Spicy rather than Zeek scripting]

Just to tease apart two separate issues here: (1) the streamlined simplicity of being able to express a += b to append strings (which I'm a fan of, it's quite analogous to existing operators in the language), vs. (2) the desire that concatenation of a new string to the end of an existing string be efficient (which I'm also of fan of!). To do that requires more than a += operator - there would need to be a change to the internal representation of strings. I could see doing that, since we keep getting bitten by this sort of O(N^2) behavior leading to DOS threats, but it's more involved than just supporting a += operator.

@bbannier
Copy link
Member

bbannier commented Aug 20, 2023

@vpax, could you clarify (2), in particular where you see quadratic performance for +=? Right now Spicy string' runtime type is a pretty thin wrapper around C++'s std::string so it inherits its tradeoffs and runtime characteristics (e.g., in the impl used by my compiler no naive capacity doubling on s = s + s, but still double iteration since strings fully own data). Simple benchmarks also seem to show that execution time has only a weak dependency on the length up to a couple thousand characters (likely: values fit in fast cache).

For extremely long strings it might still make sense to switch to a different data structure (e.g., non-continuous storage, more involved data structures like ropes or similar, ...), but the exact tradeoffs depend likely on what operations you would like to see made faster. One could e.g., make s = s + s fast by using a refcounted implementation for individual rope segments, but this could also lead to much slower string operations for e.g., short strings. Would you mind capturing the exact uses cases you have in mind in a fresh ticket or discussion so we can track this further?

@vpax
Copy link

vpax commented Aug 20, 2023

@bbannier sorry I missed that this was in a Spicy context. I was referring to regular Zeek.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Request for new functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants