adds an experimental `mm:atomicArc` switch #21798

ringabout · 2023-05-06T06:46:25Z

booting: (Intel 12th i7)
6.374s (--mm:arc --threads:on)
7.107s (--mm:atomicArc --threads:on)

arnetheduck · 2023-05-06T08:38:22Z

This kind of kicks the can down the road: you still have all shared mutable ownership problems, just at a different level of the code, while all code pays a price.

For threading, the ability to express unique ownership is the constraint that makes passing data between threads easy: you guarantee at all levels that nobody else is accessing the data..

Araq · 2023-05-06T11:27:36Z

On my Ryzen CPU I get these numbers for bootstrapping:

ARC: 165494 lines; 7.864s; 489.047MiB peakmem;
Atomic ARC: 165494 lines; 7.989s; 489.059MiB peakmem;

Likewise for the "binary trees" benchmark but that benchmark is flawed for this comparison so there is no need to report the numbers.

On OSX M1:

atomicArc: 167027 lines; 5.773s; 490.902MiB peakmem;
arc: 167027 lines; 6.070s; 491.109MiB peakmem;

jmgomez · 2023-05-06T12:43:07Z

M1Max
--mm:arc --threads:on
6.042s;
--mm: atomicArc --threads:on
6.030s;

Threadripper 24
--mm:arc --threads:on
18.247s;
--mm: atomicArc --threads:on
20.056s

Clonkk · 2023-05-06T21:50:11Z

Benchmark on Ubuntu 23.04, Intel i5 6th gen 1.6GHz - 4 core 8 threads.

./koch boot -d:release --threads:on --mm:arc

Hint: mm: arc; threads: on; opt: speed; options: -d:release
166803 lines; 14.771s; 490.719MiB peakmem;

./koch boot -d:release --threads:on --mm:atomicArc

int: mm: atomicArc; threads: on; opt: speed; options: -d:release
166803 lines; 14.022s; 490.762MiB peakmem;

On multiple run I get results varying between 12-14s, very little difference between atomicArc vs arc.

guzba · 2023-05-07T00:30:50Z

This kind of kicks the can down the road: you still have all shared mutable ownership problems, just at a different level of the code, while all code pays a price.

For the sake of debate I will take the counter position to this statement (also posted this on the Nim forum).

With it appearing atomic ARC has negligible performance impact, I currently see atomic ARC as a better option vs current ARC. It calls in to question if non-atomic ARC should even exist?

With this change, ref objects behave much more like the behavior of other very popular langs like Swift, C#, Go, Java etc. This mental model compatibility seems very valuable to me.

Yes you still need locks to mutate the state, we're all very used to that from other languages, it's fine and a separate concern. Independent graphs do not capture all shared heap use-cases so it seems like a pretty narrow focus.

Without atomic ARC, ref objects can basically never be used in the same ways with threads as one would expect from other languages. With atomic ARC, even if there is a performance cost in some cases, the programmer can overcome that cost with some effort. Basically, in one direction there is an impossible hurdle, in the other there is a very manageable hurdle that often does not matter or even exist at all.

I do not see clearly how ORC fits into this work though, since while the RC would be atomic, it would seem to me that ORC would need a global lock for adding and checking for cycles. Maybe this is fine too? Considering once again a programmer can just add {.acyclic.} to avoid any cost when it is a known non-issue. An atomic ARC but non-atomic ORC would be a strange mix.

AmjadHD · 2023-05-07T00:34:04Z

Does this affect single-threaded code ?

Araq · 2023-05-07T03:44:01Z

Does this affect single-threaded code ?

Yes for --threads:on, no for --threads:off.

mratsim · 2023-05-07T09:58:19Z

Recent languages have struggled with how to deal safely with shared ownership.

I think it's important to distinguish between ref object and atomicRef object so that objects with shared ownership can be easily spotted and audited and reviewed.

Furthermore, recent languages with the hindsight of C/C++ mutability issues have strongly discouraged shared ownership and encouraged producer-consumer patterns and explicit transfer of ownership via channels (Go / Rust).

Making all ref atomicRef implicitly is a step in the wrong direction for me and I'd strongly prefer atomicRef to be explicit.

guzba · 2023-05-07T13:39:40Z

encouraged producer-consumer patterns and explicit transfer of ownership via channels (Go / Rust)

Sure this is a great idea, however in practice one can observe that locks on a shared reference type is still how things end up being done and chosen in spite of this encouragement.

Database connection pools are an example that comes to mind right away, just to provide a concrete example:

Redis: https://github.com/redis/go-redis/blob/master/internal/pool/pool.go#L244 or https://github.com/gomodule/redigo/blob/master/redis/pool.go#L198

Postgres: https://github.com/jackc/pgx/blob/master/pgxpool/pool.go#L489 + https://github.com/jackc/puddle/blob/master/pool.go#L344

Rust seems to have turned this into "Rocket" science https://github.com/SergioBenitez/Rocket/blob/master/contrib/sync_db_pools/lib/src/lib.rs Not sure if anyone really uses this though? There is also https://github.com/sfackler/r2d2/blob/master/src/lib.rs#L418 (Mutex).

Now perhaps I am finding the lock solutions because I'm asking a lock problem, however there is clearly a need for either atomicRef or an explicit thing as suggested. So, past that:

My issue with an explicit thing is we've now created a whole new class of thing that is different from other things and the only reason is either 1) for performance which is a measurable trade-off, maybe a non-issue, and avoidable with --threads:off, or 2) to require programmers to do things a way you want them to when it is provably not chosen when alternatives are available.

I am not so sure I know the answer for how to build things for everyone, so I suggest keeping things simple and avoiding further combinatorial multiplier factors (oh that is an aref which is special vs other refs so you can't put refs on an aref but you can put an aref on a ref, we all know this problem). It is also the case that an aref would have all the issues of a ref in atomicArc so it really does only serve as a flag, coming with all the cost for such a small benefit.

Thanks for letting me add my pov and for sharing yours.

arnetheduck · 2023-05-07T14:20:01Z

Recent languages have struggled with how to deal safely with shared ownership.

further stressing this point, shared ownership is a problem in all contexts - what happens is that the problems get exacerbated wildly with any out-of-order execution: this includes async and other thread-like constructs. Nim currently is lacking in the single ownership area in general, it is not a problem limited to threads (ergo any solution to this problem should hopefully apply to both threads and not, since it is essentially the exact same problem).

... strongly discouraged shared ownership and encouraged producer-consumer patterns ... Making all ref atomicRef implicitly is a step in the wrong direction

+1 - even C++ has moved on to unique_ptr as it's main ref-like construct.

producer-consumer

Incidentally, this also solves contention and a host of other multithreading / atomics-related performance issues.

shared reference type is still how things end up being done

In these exceptional cases where low-level control is desired, maybe it's not ref you're looking for? create can be used to create perfectly fine user-managed ptr instances that don't have the ref problems. Different problem, different primitive. In fact, it would be nice if one could tag a type explicitly as not containing any GC at all - these are always trivially transferable between threads with "classic" (aka error-prone) techniques like locks.

Keep in mind that manually managed locks already are prone to all the resource management issues that manual memory allocation exhibits, and more: you have to deal with taking locks, releasing them, doing so in correct order to avoid deadlocks etc - calling dealloc at the end of the operation is the least of your concerns if your desire for hard-to-debug-code is this high.

guzba · 2023-05-07T14:44:33Z

Avoiding any debate on the points you've made since it won't really affect things, I'd like to be concrete about where I am coming from.

I have recently written a bunch of open source threaded Nim code and it devolves to ptr + manual memory management basically right away. ref object is unusable. Is this truly the best Nim can do?

I get that people want the unique ownership ideal but it doesn't work at all today and doesn't work for things that I don't think are exceptional cases. It seems like a Procrustean bed to me as someone productively using threads in production right now.

mratsim · 2023-05-08T05:21:01Z

Database connection pools are an example that comes to mind right away, just to provide a concrete example:

There are very few resources that map naturally to atomic ref in my experience, databases and handles to GPUs.
For those resources, atomic ref overhead is negligeable compared to the overhead of DB IO and GPU memcpy.

For other cases where sharing is desirable:

either fully shared memory with algorithmic guarantees that no data race can occur (parallel-for as in most scientific computing) is preferred
or the producer-consumer pattern is preferred (microservices / unix pipes-like division of work)

My issue with an explicit thing is we've now created a whole new class of thing that is different from other things and the only reason is either 1) for performance which is a measurable trade-off, maybe a non-issue, and avoidable with --threads:off, or 2) to require programmers to do things a way you want them to when it is provably not chosen when alternatives are available.

We want explicit to avoid sweeping under-the-rug a very complex case of problems that have plagued developers for decades. And for quality software, we want to direct security auditors to "grep type Foo = atomicRef object and pay special attention to how they are used."

The mindset to review correctness of shared primitives is completely different from single-threaded primitives. It's very important to have an explicit keyword to trigger that change of mindset.

I have recently written a bunch of open source threaded Nim code and it devolves to ptr + manual memory management basically right away. ref object is unusable. Is this truly the best Nim can do?

Currently there is at least one missing abstraction layer in Nim above createThread:

a threadpool with support for producer-consumer pattern (i.e. task parallelism: Executors, task parallelism, channels APIs RFCs#347) but for IO workloads https://nim-lang.org/blog/2021/02/26/multithreading-flavors.html instead of CPU workloads like https://github.com/status-im/nim-taskpools and https://github.com/mratsim/weave
Optionally an actor library built on top (and you can very well have an actor that handles all DB connections at that point).

The reason why that abstraction layer is missing is that it's an extremely hard and tangled problem, which also might suffer from no-one-size-fits all:

Nim default memory management was (and is still) thread-local. There was very few incentives to change the status quo:
- Newbie devs had plenty on their plate to learn Nim
- The very few companies needed a product to ship first before engaging in something more experimental
- Which only left projects that grew in the room allotted by Nim stdlib and now are constrained by it to try to improve the situation. With many uncertainties both on the technical details and delivery time.
It's strongly tied to async:
- async uses ref objects everywhere and ref objects are a non-starter for multithreading
- the async dispatcher is a strong dependency and a hard problem
- coloring issues (we can argue that ref vs atomic ref is also coloring)
It's strongly tied to closure iterators and the concept of resumable functions. Closure iterators had some of the nastiest bugs, with runtime-only bugs, in exceptional paths (try/finally or defer) finally block doesn't execute in async proc #19911 and there had been experiments like Continuation-Passing Style (CPS) to bring transformations that have been more formally reviewed and come with proofs of correctness in the litterature
It's strongly tied to handling exceptions and exceptions use ref objects.

I have made very extensive research in how new languages with extensive hindsight (Kotlin, Swift, C++20) on the "old new" (Go, Rust), new directions (Pony), proven models (Lua, Erlang) and also academic languages (ML-family with CPS) but all had to make tradeoffs in performance, usability, workloads supported naturally, ...

Araq · 2023-05-08T05:43:13Z

This is not really the right place for the discussion but ok, here are my thoughts:

We need both:
- The ability to use lock effectively on a ref. I still wonder why people struggle with this, seems really easy if you know the disarm destructor trick. Note that locking can be done outside of a library in the caller and make code threadsafe that previously wasn't. This is a tremendous advantage as it allows one to use 3rd party code easily and this 3rd party code does not have to make a choice about ref vs atomic ref.
- isolate for effective message passing of isolated graphs. We have the mechanism for it, but fight with the details of the spec.
Neither solution needs atomicArc and indeed it seems to be a step in the wrong direction to me too. However, maybe your CPU doesn't mind the overhead and it feels a bit strange not to allow for this mode of operation for the people who know what they are doing. It can also be seen as a valuable debug tool ("would the sanitizer be happier with atomic instructions? let's see with this simple switch")
We really need to investigate why --threads:on is so expensive! Is the thread local storage still slow? Is the new shiny shared allocator to blame?

Araq · 2023-05-08T14:25:42Z

Merging it to keep it from bitrotting but it won't become official anytime soon.

github-actions · 2023-05-08T14:38:00Z

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from 4533e89

Hint: mm: orc; opt: speed; options: -d:release
167104 lines; 8.955s; 613.176MiB peakmem

adds an experimental mm:atomicArc switch

cded0f5

Araq merged commit 4533e89 into devel May 8, 2023

Araq deleted the pr_atomic_arc branch May 8, 2023 14:25

capocasa pushed a commit to capocasa/Nim that referenced this pull request May 15, 2023

adds an experimental mm:atomicArc switch (nim-lang#21798)

01725fa

capocasa pushed a commit to capocasa/Nim that referenced this pull request May 16, 2023

adds an experimental mm:atomicArc switch (nim-lang#21798)

9e58576

bung87 pushed a commit to bung87/Nim that referenced this pull request Jul 29, 2023

adds an experimental mm:atomicArc switch (nim-lang#21798)

3d28e6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds an experimental `mm:atomicArc` switch #21798

adds an experimental `mm:atomicArc` switch #21798

ringabout commented May 6, 2023 •

edited

Loading

arnetheduck commented May 6, 2023

Araq commented May 6, 2023 •

edited

Loading

jmgomez commented May 6, 2023

Clonkk commented May 6, 2023

guzba commented May 7, 2023

AmjadHD commented May 7, 2023

Araq commented May 7, 2023

mratsim commented May 7, 2023

guzba commented May 7, 2023 •

edited

Loading

arnetheduck commented May 7, 2023 •

edited

Loading

guzba commented May 7, 2023

mratsim commented May 8, 2023 •

edited

Loading

Araq commented May 8, 2023 •

edited

Loading

Araq commented May 8, 2023

github-actions bot commented May 8, 2023

adds an experimental mm:atomicArc switch #21798

adds an experimental mm:atomicArc switch #21798

Conversation

ringabout commented May 6, 2023 • edited Loading

arnetheduck commented May 6, 2023

Araq commented May 6, 2023 • edited Loading

jmgomez commented May 6, 2023

Clonkk commented May 6, 2023

guzba commented May 7, 2023

AmjadHD commented May 7, 2023

Araq commented May 7, 2023

mratsim commented May 7, 2023

guzba commented May 7, 2023 • edited Loading

arnetheduck commented May 7, 2023 • edited Loading

guzba commented May 7, 2023

mratsim commented May 8, 2023 • edited Loading

Araq commented May 8, 2023 • edited Loading

Araq commented May 8, 2023

github-actions bot commented May 8, 2023

adds an experimental `mm:atomicArc` switch #21798

adds an experimental `mm:atomicArc` switch #21798

ringabout commented May 6, 2023 •

edited

Loading

Araq commented May 6, 2023 •

edited

Loading

guzba commented May 7, 2023 •

edited

Loading

arnetheduck commented May 7, 2023 •

edited

Loading

mratsim commented May 8, 2023 •

edited

Loading

Araq commented May 8, 2023 •

edited

Loading