-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making Nim's strings Copy-On-Write (COW) #221
Comments
I think slices should be of type |
We could also do this: In all assignments Then we would benefit from parameters being |
I agree with @Clyybber before going all the way to COW we need to use the sink/move/lent optimizations. Second, there are 2 performance issues with strings:
I'd argue that COW is solving the wrong problem (problem 1). The copies are often the result of a slice that stay in the same stack frame and can benefit from And the most important performance issue is intermediate allocations. COW doesn't solve the problem of allocation of intermediate results. Sources (i.e. benchmark should be inspired by the kind of code people are trying to write here):
|
Let me say it differently: |
It's a very basic type, it should be as simple as possible given that it will be used in many contexts including embedded, async and multithreading. Does the refcount become atomic with --threads:on? |
As simple as possible is covered by
Unlikely, we'll probably keep doing =deepCopy for threading. |
Wouldn't copy-on-write be addressed better in the type system instead of through run-time flags? A string literal can have a different type that is implicitly convertible to an immutable string, but also triggers a copy when being passed as With that said, I like the idea of allowing the allocated buffer of strings and sequences to be shared and reference counted without introducing an indirection. Something along the lines of C++'s allocated_shared comes to mind where the ref count appears in the memory just before the normal string value. The holy grail in the strings vs sequence debates for me is figuring out a way how to make One possible answer is that the Combining these two ideas, the ref counted strings can store their ref count at the end of the allocated cell for the buffer. The price you'll pay is that the string capacity is slightly decreased. You can check the capacity value to recognize the type of string you are working with, so you can support transitions between the ref-counted mode and the single-owner mode. |
That seems to be very bad for locality / CPU caches. |
In practice, for many programs the IO primitives offered by C often have APIs that take Given the other obvious benefits (type safety!), maybe it's time to replace the |
Well, I'll be happy if Otherwise, to explain the idea above more precisely, the cache locality argument shouldn't be that bad, because: A) We can consider the ref counting case as more rare and not worth optimising. To achieve this we only need a fast check for the question "Is this string ref-counted?". We can do this by checking a single bit in the A read of the B) As SSO proclaims, most strings are small (smaller than the cache line of a modern processor, which is likely to continue increasing). Thus, quite often, the ref count will also be in the CPU cache anyway. |
I had a read, feels that It does feel odd that |
C++ dreamed of that too, with Instead of making any cow solution doesn't just introduce one word of refcount overhead - it also forces every operation to maintain that refcount and introduces complex / costly operations in unexpected places - ie SSO is efficient for several reasons:
borrowing is indeed the systemic solution to this problem - if it was introduced as an optimization, it could analyze away copies so the cost to the programmer in terms of having to deal with syntax would be kept low - the way to approach the problem then would be to design the string to be more amenable to this kind of analysis by constraining what you can do with it. |
reallocating here opens up a problematic memory safety issue as well - who owns the memory of |
It's trivial to make the const-section strings zero-terminated
With
There aren't enough differences to justify the code bloat and the buffer copies needed in conversions that stem from having two different types. |
This memory can certainly be freed if you consider the code to be equivalent to: callSomeCFunction(TempCString(data: createCStringCopy(nimString))) where |
So here is a working implementation https://github.com/planetis-m/dumpster/blob/master/cow/cowstrings.nim in case anyone wants to benchmark it. And here is a benchmark of std.strings/cowstrings/ssostrings. https://github.com/planetis-m/ssostrings/blob/master/tests/bench1.nim Also relevant: http://www.gotw.ca/publications/optimizations.htm |
Motivation: Make naive string handling code faster
In the ARC/ORC implementation string literals are already not copied and not destroyed. When these strings are potentially mutated
nimPrepareStrMutationV2
is called to copy the constant string into a buffer that allows for mutations. Currently the condition(s.p.cap and strlitFlag) == strlitFlag
is used to determine whethers.p
points to a constant string.In Delphi/FPC a refcount of -1 is used instead to allow for a more general COW mechanism. This RFC proposes to copy this design as much everyday Nim code is written without the current expensive copies in mind, e.g.:
This is particularly troublesome for
async
procs where the snippetenv.field = param
is introduced so that parameters can be captured inside closures. However, asink
parameter probably has the same benefit.As a side effect Nim's ARC strings can support the old
GC_ref
/GC_unref
calls.Expected compat problems
It means that
cast[seq[char]](mystr)
stops working. We can detect this case in the compilerand
fusion / compat
can introducecastToSeq
/castToString
cast operations thatwork for all runtimes.
Alternatives/additions considered but rejected (for now)
Make Nim's strings use C++'s SSO
SSO looks like a rather fragile questionable optimization, so copying short strings around doesn't involve touching the heap, yet long strings are as slow as before to copy? And almost every operation internally needs to distinguish between "is this a long or a shorts string". Also, in Nim filenames/paths are currently not a dedicated type and directly stored as string too; most paths are longer than what is covered by the "short string" representation.
Make Nim's string slices O(1)
People coming from Go (and for some reason also people coming from Python) expect
substr
ands[1..3]
to be faster than they really are. In the old runtime this was even worse as string allocations cause the occasional GC run. In the current new runtime the allocations remain. If the string implementation is changed fromto
The slice operation can be done in O(1) time. However, every slice creation would
cause a
inc(s.p.refcount)
operation and also involve a corresponding destruction step. Thisprobably means that it's not fast enough in practice and so our effort is better spent on
using more
openArray[char]
with borrowing rules instead. This also has the benefit that short string slices cannot keep a multi megabyte sized string buffer alive for too long; a problembig enough in the JVM world that they moved from O(1) slicing to O(N) slicing in a patch release.
Implementation effort
strs_v2.nim
The text was updated successfully, but these errors were encountered: