Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add std/smartptrs #17333

Closed
wants to merge 35 commits into from
Closed

add std/smartptrs #17333

wants to merge 35 commits into from

Conversation

ringabout
Copy link
Member

@ringabout ringabout commented Mar 11, 2021

After discussing with araq, move smartptrs from fusion, add isolation support.

The original code:
https://github.com/nim-lang/fusion/blob/master/src/fusion/smartptrs.nim

thanks for the contributions of @Araq @planetis-m and @narimiran

thanks @cooldome
#10485

Todo:

  • an atomic runnableExamples for sharedPtr at the muti-threads environement
  • conisder implementing AtomicSharedPtr
  • use std/atomics instead of system/atomics and use moAquire instead of moConsume

@ringabout ringabout marked this pull request as draft March 11, 2021 03:24
@ringabout
Copy link
Member Author

ringabout commented Mar 11, 2021

Maybe also add

proc extract[T](p): T =
  move p.val

and count getter for SharedPtr

proc count(): int

and maybe use std/atomics instead of system/atomics?

lib/std/smartptrs.nim Outdated Show resolved Hide resolved
lib/std/smartptrs.nim Outdated Show resolved Hide resolved
lib/std/smartptrs.nim Outdated Show resolved Hide resolved
lib/std/smartptrs.nim Outdated Show resolved Hide resolved
lib/std/smartptrs.nim Outdated Show resolved Hide resolved
lib/std/smartptrs.nim Outdated Show resolved Hide resolved
lib/std/smartptrs.nim Outdated Show resolved Hide resolved
lib/std/smartptrs.nim Outdated Show resolved Hide resolved
lib/std/smartptrs.nim Outdated Show resolved Hide resolved
@konsumlamm
Copy link
Contributor

a.isNil == true => a.isNil
a.isNil == false => not a.isNil
(see #17333 (comment) & #17333 (comment))

@ringabout
Copy link
Member Author

a.isNil == true => a.isNil
a.isNil == false => not a.isNil
(see #17333 (comment) & #17333 (comment))

done

@ringabout ringabout marked this pull request as ready for review March 19, 2021 14:49
lib/std/smartptrs.nim Show resolved Hide resolved
@Varriount
Copy link
Contributor

I like the overall design, however the documentation needs improvement. The assumption should be that this module will be used by users who aren't familiar with C++ and its smart pointers.

@arnetheduck
Copy link
Contributor

essence of a

More than anything, it captures a "unique instance" perhaps - but not a pointer - because this type allocates its own memory in a specific memory pool, it also breaks all pointers that point to the original instance - this is a significant distinction, together with the fact that copying the instance may be expensive - a type X = array[1024*1024*100, byte] structure, when you put it in a UniquePtr as proposed will cause a memory usage spike of 100 mb and a copy - this is quite different from the semantics that one would expect from moving a "pointer" around.

In other words, it's a nice little utility maybe, but it doesn't have that much to do with the use cases that unique_ptr in C++ typically are used for, neither is it really a traditional "pointer" in the nim sense.

@Araq
Copy link
Member

Araq commented Jun 25, 2021

I'm not sure what you mean.

@planetis-m
Copy link
Contributor

@arnetheduck turns out you can attach destructors to distinct ptr, so imo that would be a more elegant way for an object pool to work, regardless post an example plz, I would be interested to tinker with.

@dom96
Copy link
Contributor

dom96 commented Jun 25, 2021

Agree with @Varriount. The documentation should also contain what this module's intended use cases are. I'm actually wondering: why are we adding such a big C++ concept to Nim's stdlib?

@Grinshpon
Copy link

I'm actually wondering: why are we adding such a big C++ concept to Nim's stdlib?

Nim seems to be going in a direction that promotes gc-less programming, at least according to Araq's blog, and so having smart pointers would be extremely beneficial as a way to not deal with raw pointers, in cases where the programmer still needs to be dealing with pointers.

@Varriount
Copy link
Contributor

I'm actually wondering: why are we adding such a big C++ concept to Nim's stdlib?

Nim seems to be going in a direction that promotes gc-less programming, at least according to Araq's blog, and so having smart pointers would be extremely beneficial as a way to not deal with raw pointers, in cases where the programmer still needs to be dealing with pointers.

Eh. The ARC GC essentially replicates what C++ smart pointers do - both use refcounting mechanisms to determine when memory should be released. My interpretation is that this module was wanted because ARC doesn't do atomic refcounting, and this module can implement that.

@Araq
Copy link
Member

Araq commented Jun 26, 2021

My interpretation is that this module was wanted because ARC doesn't do atomic refcounting, and this module can implement that.

Entirely correct. These shared pointers are a last resort, the recommended solution is to send subgraphs via isolate.

@dom96
Copy link
Contributor

dom96 commented Jun 27, 2021

What's the blocker for changing ARC to do atomic refcounting? Is it something that we decided not to do ever or is this module just to provide a workaround in the short-term?

If the former we should make it clear in the docs of this module that it should be used for shared memory management. If the latter we shouldn't have this module in stdlib.

@Araq
Copy link
Member

Araq commented Jun 28, 2021

What's the blocker for changing ARC to do atomic refcounting?

Atomic refcounting is simply too slow for idiomatically written Nim code.

Is it something that we decided not to do ever or is this module just to provide a workaround in the short-term?

No, it's a very long-term solution as I doubt hardware will ever make atomic operations fast enough. Unfortunately.

@dom96
Copy link
Contributor

dom96 commented Jun 28, 2021

Isn't there a safer alternative? Having to manage memory manually will make our parallelism a worse experience than most languages out there.

@Araq
Copy link
Member

Araq commented Jun 28, 2021

Er, this is pretty much Rust's solution, it's not unsafe, the unsafety is hidden behind the API. But anyhow, the encouraged solution is to use isolate instead.

@planetis-m
Copy link
Contributor

planetis-m commented Jun 29, 2021

More than anything, it captures a "unique instance" perhaps - but not a pointer - because this type allocates its own memory in a specific memory pool, it also breaks all pointers that point to the original instance - this is a significant distinction, together with the fact that copying the instance may be expensive - a type X = array[1024*1024*100, byte] structure, when you put it in a UniquePtr as proposed will cause a memory usage spike of 100 mb and a copy - this is quite different from the semantics that one would expect from moving a "pointer" around.

Try: https://gist.github.com/planetis-m/42b675403212e018b5b9c9cc2378dffc

Clang is able to optimize this, there is no memcpy, data are placed directly on the heap. Whereas gcc produces code that crashes with SIGSEGV.

Screenshot_20210706_225250

...but it only works with -d:release.

@dom96
Copy link
Contributor

dom96 commented Jun 29, 2021

Er, this is pretty much Rust's solution, it's not unsafe, the unsafety is hidden behind the API. But anyhow, the encouraged solution is to use isolate instead.

What stops us from making memory sharing across threads as easy as in Golang or Python? I think Nim should be more expressive than Rust here, so comparing to Rust isn't enough.

@Araq
Copy link
Member

Araq commented Jun 29, 2021

Golang needs runtime data race checking and Python's multi-threading story is famous for its global interpreter lock...

so comparing to Rust isn't enough.

It's good enough for me and nobody has a better solution.

@dom96
Copy link
Contributor

dom96 commented Jun 29, 2021

Golang needs runtime data race checking

And smartptrs doesn't?

Anyway, we can discuss offline. Before merging this let's change the docs of this module though to reference isolate and make it clear that this module is for rare edge cases where isolate is not enough.

@planetis-m
Copy link
Contributor

So, in order for [] to be memory safe, we need to return Isolated[var T] or something.

@Grinshpon
Copy link

Atomic refcounting is simply too slow for idiomatically written Nim code.

I'd like some clarification on this. Are SharedPtr's slower than ref's with --gc:arc?

@Araq
Copy link
Member

Araq commented Jul 21, 2021

I'd like some clarification on this. Are SharedPtr's slower than ref's with --gc:arc?

Yes, definitely. Rust has both Rc and Arc for a reason.

@Grinshpon
Copy link

Yes, definitely. Rust has both Rc and Arc for a reason.

May I ask why? To my knowledge, a SharedPtr is just incrementing/decrementing an integer counter, unless this has to do with low-level stuff I don't know about yet

@Araq
Copy link
Member

Araq commented Jul 22, 2021

May I ask why? To my knowledge, a SharedPtr is just incrementing/decrementing an integer counter, unless this has to do with low-level stuff I don't know about yet

It's an atomic count operation, 30x slower than an ordinary count operation. It prevents the CPU from doing pipelining effectively and needs communication between the different CPU cores.

From http://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf
"As shown in the figure, performing RC operations takes on
average 42% of the execution time in client programs, and 15% in
server programs. The average across all programs can be shown to
be 32%"

result.val[] = val.extract
# no destructor call for 'val: sink T' here either.

template newUniquePtr*[T](val: T): UniquePtr[T] =
Copy link
Member

@timotheecour timotheecour Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mention in #18450 (comment) for the other PR, this is what's needed:

proc newUniquePtr*[T](U: typedesc[T]): UniquePtr[T] =
  result.val = cast[ptr T](allocShared(sizeof(T)))

which also ends up producing correct results for the C++ example i gave, unlike what the other APIs do.

read the whole thread starting here for more context: #18450 (comment)

I also wonder whether we actually need proc newUniquePtr[T](val: sink Isolated[T]): UniquePtr[T] {.nodestroy.} = given that proc newUniquePtr*[T](U: typedesc[T]): UniquePtr[T] = seems simpler, more correct, and more performant (no copy)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Don't you want the unique pointer to point to some valid object? Your proc doesn't do the same.

Copy link
Member

@timotheecour timotheecour Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm allocating on the heap only once instead of allocating in caller scope (eg on stack) and then allocating on heap and copying; it's more efficient and analogous to C++ placement new on which C++ make_unique relies, avoids complications of isolated, and is also more correct wrt destructors being called as you can see in #18450 (comment)

Caller can then, if needed, assign values in the heap allocated object, eg:

import std/smartptrs

type ManagedFile = object
  name: string
  cfile: File

proc `=destroy`(a: var ManagedFile) =
  echo ("in =destroy", a.name)
  close(a.cfile)
  a.cfile = nil

proc main()=
  block:
    let p = newUniquePtr(ManagedFile)
    p[].name = currentSourcePath
    p[].cfile = open(p[].name, fmRead)
    echo p[].cfile.readAll.len
  echo "after"
main()

prints:

381
("in =destroy", "/Users/timothee/git_clone/nim/timn/tests/nim/all/t12621.nim")
after

It's also the correct way to wrap C++ objects (eg with non-trivial constructor): it allows the C++ object to be constructed exactly once, and then destroyed when the unique_ptr goes out of scope.

Copy link
Contributor

@planetis-m planetis-m Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timotheecour its already taken care of https://github.com/nim-lang/threading/blob/master/threading/smartptrs.nim#L48 Its not more efficient though, you should use the sink one (unless you have to) as I said in #17333 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    p[].name = currentSourcePath
    p[].cfile = open(p[].name, fmRead)

is not as efficient as direct inplace construction. Not because of the lack of moves, but because the compiler doesn't know that in p[] there is nothing to destroy yet.

Copy link
Member

@Araq Araq Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one, showing that the way I proposed is 3.4X faster, where the speedup depends on the size of the object; you'll get smaller speedups for smaller objects but always at least as fast.

But that's measuring artifacts of the current implementation, in particular NVRO (which is unfortunately currently fragile) or the lack thereof. In theory it's a pessimization as you mitigated the compiler's ability to reason about the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the reasoning here; in current code, you:
1 construct an object Foo in the stack
2 allocate sizeof(Foo) on the heap
3 copy Foo

with proposed code you:
1 allocate sizeof(Foo) on the heap
2 construct in-place

in all possible scenarios, it should be at least as fast, and usually faster; eg it allows placement-new with C++, or simply just constructing in-place the fields you care about updating from their default value (if some of the memory is to remain 0-initialized, you don't pay for it); you also don't have extra ctor/dtor to worry about, in particular for C++ objects.

n element version

Finally, this generalizes in a straightforward way to C++'s array version of make_unique, which we should add at some point; it allows dynamic allocation of n objects without having to use a seq (eg so you can interface with C++ or avoid relying on gc for allocating n elements); there are 2 approaches to consider:

  • same as C++, make the n implicit (user code need to pass that value around somewhere else)
proc newUniquePtr*[T](U: typedesc[T], n = 1): UniquePtr[T] =
  result.val = cast[ptr T](allocShared(sizeof(T) * n))
  • add a UniquePtrN type with an extra field n: int
type
  UniquePtrN*[T] = object
    val: ptr T
    n: int
proc newUniquePtrN*[T](U: typedesc[T], n = 1): UniquePtrN[T] =
  result.val = cast[ptr T](allocShared(sizeof(T) * n))
  result.n = n

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow, the loophole is already there,

I am just pointing it out. I had totally missed it before.

n element version

Why not use a seq instead?

Copy link
Member

@Araq Araq Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the reasoning here; in current code, you: ...

Your're correct, however, the code is still not as efficient as it could be if we had proper "in-place" construction. I think we could do proper in-place construction for the construct toSinkParam ObjConstr(x: x, y: y, ...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for c++ objects, it should already be possible today without a language extension (via importcpp of new(buf) T), although i haven't tried yet
for nim objects, it shouldn't be hard to extend semObjConstr to take an optional input buffer address, eg:

var p: ptr Foo = ...
Foo.new(p, a1: 1, a2: 2) # explicit syntax
p[] = Foo(a1: 1, a2: 2) # syntax sugar possible?

in any case, these can be added later, and proc newUniquePtrN*[T](U: typedesc[T]): UniquePtrN[T] = can be introduced before those exist; code would just get auto-optimized once p[] = Foo(a1: 1, a2: 2) starts working as in-place construction

@arnetheduck
Copy link
Contributor

More than anything, it captures a "unique instance" perhaps - but not a pointer - because this type allocates its own memory in a specific memory pool, it also breaks all pointers that point to the original instance - this is a significant distinction, together with the fact that copying the instance may be expensive - a type X = array[1024*1024*100, byte] structure, when you put it in a UniquePtr as proposed will cause a memory usage spike of 100 mb and a copy - this is quite different from the semantics that one would expect from moving a "pointer" around.

Try: https://gist.github.com/planetis-m/42b675403212e018b5b9c9cc2378dffc

Clang is able to optimize this, there is no memcpy, data are placed directly on the heap. Whereas gcc produces code that crashes with SIGSEGV.

Screenshot_20210706_225250

...but it only works with -d:release.

Relying on the C-level optimizer tends to be a bit fragile - either we must limit construction such that the Nim compiler can guarantee that it generates code with in-place construction in all valid cases (ie even with instances returned from a function etc), or it might be difficult to rely on the type in generic code where it's hard to control the stack size.

We already have many issues with stack overflow in Nim because the rules for when RVO happens are .. fragile, like this.

@arnetheduck
Copy link
Contributor

@arnetheduck turns out you can attach destructors to distinct ptr, so imo that would be a more elegant way for an object pool to work, regardless post an example plz, I would be interested to tinker with.

https://stackoverflow.com/questions/28843379/how-to-use-boostobject-pool-with-stdunique-ptr would be a classic example

@ringabout ringabout closed this Sep 22, 2021
@ringabout ringabout deleted the a147 branch September 19, 2022 14:27
@ringabout ringabout removed the TODO: followup needed remove tag once fixed or tracked elsewhere label Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.