-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add std/smartptrs #17333
add std/smartptrs #17333
Conversation
Maybe also add proc extract[T](p): T =
move p.val and count getter for SharedPtr proc count(): int and maybe use |
Co-authored-by: konsumlamm <[email protected]> Co-authored-by: Juan Carlos <[email protected]>
|
I like the overall design, however the documentation needs improvement. The assumption should be that this module will be used by users who aren't familiar with C++ and its smart pointers. |
More than anything, it captures a "unique instance" perhaps - but not a pointer - because this type allocates its own memory in a specific memory pool, it also breaks all pointers that point to the original instance - this is a significant distinction, together with the fact that copying the instance may be expensive - a In other words, it's a nice little utility maybe, but it doesn't have that much to do with the use cases that |
I'm not sure what you mean. |
@arnetheduck turns out you can attach destructors to distinct ptr, so imo that would be a more elegant way for an object pool to work, regardless post an example plz, I would be interested to tinker with. |
Agree with @Varriount. The documentation should also contain what this module's intended use cases are. I'm actually wondering: why are we adding such a big C++ concept to Nim's stdlib? |
Nim seems to be going in a direction that promotes gc-less programming, at least according to Araq's blog, and so having smart pointers would be extremely beneficial as a way to not deal with raw pointers, in cases where the programmer still needs to be dealing with pointers. |
Eh. The ARC GC essentially replicates what C++ smart pointers do - both use refcounting mechanisms to determine when memory should be released. My interpretation is that this module was wanted because ARC doesn't do atomic refcounting, and this module can implement that. |
Entirely correct. These shared pointers are a last resort, the recommended solution is to send subgraphs via |
What's the blocker for changing ARC to do atomic refcounting? Is it something that we decided not to do ever or is this module just to provide a workaround in the short-term? If the former we should make it clear in the docs of this module that it should be used for shared memory management. If the latter we shouldn't have this module in stdlib. |
Atomic refcounting is simply too slow for idiomatically written Nim code.
No, it's a very long-term solution as I doubt hardware will ever make atomic operations fast enough. Unfortunately. |
Isn't there a safer alternative? Having to manage memory manually will make our parallelism a worse experience than most languages out there. |
Er, this is pretty much Rust's solution, it's not unsafe, the unsafety is hidden behind the API. But anyhow, the encouraged solution is to use |
Try: https://gist.github.com/planetis-m/42b675403212e018b5b9c9cc2378dffc Clang is able to optimize this, there is no memcpy, data are placed directly on the heap. Whereas gcc produces code that crashes with SIGSEGV. ...but it only works with |
What stops us from making memory sharing across threads as easy as in Golang or Python? I think Nim should be more expressive than Rust here, so comparing to Rust isn't enough. |
Golang needs runtime data race checking and Python's multi-threading story is famous for its global interpreter lock...
It's good enough for me and nobody has a better solution. |
And Anyway, we can discuss offline. Before merging this let's change the docs of this module though to reference |
So, in order for |
I'd like some clarification on this. Are |
Yes, definitely. Rust has both Rc and Arc for a reason. |
May I ask why? To my knowledge, a SharedPtr is just incrementing/decrementing an integer counter, unless this has to do with low-level stuff I don't know about yet |
It's an atomic count operation, 30x slower than an ordinary count operation. It prevents the CPU from doing pipelining effectively and needs communication between the different CPU cores. From http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf |
result.val[] = val.extract | ||
# no destructor call for 'val: sink T' here either. | ||
|
||
template newUniquePtr*[T](val: T): UniquePtr[T] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mention in #18450 (comment) for the other PR, this is what's needed:
proc newUniquePtr*[T](U: typedesc[T]): UniquePtr[T] =
result.val = cast[ptr T](allocShared(sizeof(T)))
which also ends up producing correct results for the C++ example i gave, unlike what the other APIs do.
read the whole thread starting here for more context: #18450 (comment)
I also wonder whether we actually need proc newUniquePtr[T](val: sink Isolated[T]): UniquePtr[T] {.nodestroy.} =
given that proc newUniquePtr*[T](U: typedesc[T]): UniquePtr[T] =
seems simpler, more correct, and more performant (no copy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Don't you want the unique pointer to point to some valid object? Your proc doesn't do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm allocating on the heap only once instead of allocating in caller scope (eg on stack) and then allocating on heap and copying; it's more efficient and analogous to C++ placement new on which C++ make_unique
relies, avoids complications of isolated, and is also more correct wrt destructors being called as you can see in #18450 (comment)
Caller can then, if needed, assign values in the heap allocated object, eg:
import std/smartptrs
type ManagedFile = object
name: string
cfile: File
proc `=destroy`(a: var ManagedFile) =
echo ("in =destroy", a.name)
close(a.cfile)
a.cfile = nil
proc main()=
block:
let p = newUniquePtr(ManagedFile)
p[].name = currentSourcePath
p[].cfile = open(p[].name, fmRead)
echo p[].cfile.readAll.len
echo "after"
main()
prints:
381
("in =destroy", "/Users/timothee/git_clone/nim/timn/tests/nim/all/t12621.nim")
after
It's also the correct way to wrap C++ objects (eg with non-trivial constructor): it allows the C++ object to be constructed exactly once, and then destroyed when the unique_ptr goes out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timotheecour its already taken care of https://github.com/nim-lang/threading/blob/master/threading/smartptrs.nim#L48 Its not more efficient though, you should use the sink one (unless you have to) as I said in #17333 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p[].name = currentSourcePath
p[].cfile = open(p[].name, fmRead)
is not as efficient as direct inplace construction. Not because of the lack of moves, but because the compiler doesn't know that in p[]
there is nothing to destroy yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's one, showing that the way I proposed is 3.4X faster, where the speedup depends on the size of the object; you'll get smaller speedups for smaller objects but always at least as fast.
But that's measuring artifacts of the current implementation, in particular NVRO (which is unfortunately currently fragile) or the lack thereof. In theory it's a pessimization as you mitigated the compiler's ability to reason about the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the reasoning here; in current code, you:
1 construct an object Foo in the stack
2 allocate sizeof(Foo) on the heap
3 copy Foo
with proposed code you:
1 allocate sizeof(Foo) on the heap
2 construct in-place
in all possible scenarios, it should be at least as fast, and usually faster; eg it allows placement-new with C++, or simply just constructing in-place the fields you care about updating from their default value (if some of the memory is to remain 0-initialized, you don't pay for it); you also don't have extra ctor/dtor to worry about, in particular for C++ objects.
n element version
Finally, this generalizes in a straightforward way to C++'s array version of make_unique, which we should add at some point; it allows dynamic allocation of n objects without having to use a seq (eg so you can interface with C++ or avoid relying on gc for allocating n elements); there are 2 approaches to consider:
- same as C++, make the n implicit (user code need to pass that value around somewhere else)
proc newUniquePtr*[T](U: typedesc[T], n = 1): UniquePtr[T] =
result.val = cast[ptr T](allocShared(sizeof(T) * n))
- add a
UniquePtrN
type with an extra fieldn: int
type
UniquePtrN*[T] = object
val: ptr T
n: int
proc newUniquePtrN*[T](U: typedesc[T], n = 1): UniquePtrN[T] =
result.val = cast[ptr T](allocShared(sizeof(T) * n))
result.n = n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow, the loophole is already there,
I am just pointing it out. I had totally missed it before.
n element version
Why not use a seq instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the reasoning here; in current code, you: ...
Your're correct, however, the code is still not as efficient as it could be if we had proper "in-place" construction. I think we could do proper in-place construction for the construct toSinkParam ObjConstr(x: x, y: y, ...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for c++ objects, it should already be possible today without a language extension (via importcpp of new(buf) T
), although i haven't tried yet
for nim objects, it shouldn't be hard to extend semObjConstr
to take an optional input buffer address, eg:
var p: ptr Foo = ...
Foo.new(p, a1: 1, a2: 2) # explicit syntax
p[] = Foo(a1: 1, a2: 2) # syntax sugar possible?
in any case, these can be added later, and proc newUniquePtrN*[T](U: typedesc[T]): UniquePtrN[T] =
can be introduced before those exist; code would just get auto-optimized once p[] = Foo(a1: 1, a2: 2)
starts working as in-place construction
Relying on the C-level optimizer tends to be a bit fragile - either we must limit construction such that the Nim compiler can guarantee that it generates code with in-place construction in all valid cases (ie even with instances returned from a function etc), or it might be difficult to rely on the type in generic code where it's hard to control the stack size. We already have many issues with stack overflow in Nim because the rules for when RVO happens are .. fragile, like this. |
https://stackoverflow.com/questions/28843379/how-to-use-boostobject-pool-with-stdunique-ptr would be a classic example |
After discussing with
araq
, movesmartptrs
from fusion, addisolation
support.The original code:
https://github.com/nim-lang/fusion/blob/master/src/fusion/smartptrs.nim
thanks for the contributions of @Araq @planetis-m and @narimiran
thanks @cooldome
#10485
Todo:
sharedPtr
at the muti-threads environementAtomicSharedPtr
std/atomics
instead ofsystem/atomics
and use moAquire instead of moConsume