-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds an experimental mm:atomicArc
switch
#21798
Conversation
This kind of kicks the can down the road: you still have all shared mutable ownership problems, just at a different level of the code, while all code pays a price. For threading, the ability to express unique ownership is the constraint that makes passing data between threads easy: you guarantee at all levels that nobody else is accessing the data.. |
On my Ryzen CPU I get these numbers for bootstrapping: ARC: 165494 lines; 7.864s; 489.047MiB peakmem; Likewise for the "binary trees" benchmark but that benchmark is flawed for this comparison so there is no need to report the numbers. On OSX M1: atomicArc: 167027 lines; 5.773s; 490.902MiB peakmem; |
M1Max Threadripper 24 |
Benchmark on Ubuntu 23.04, Intel i5 6th gen 1.6GHz - 4 core 8 threads. ./koch boot -d:release --threads:on --mm:arc Hint: mm: arc; threads: on; opt: speed; options: -d:release ./koch boot -d:release --threads:on --mm:atomicArc int: mm: atomicArc; threads: on; opt: speed; options: -d:release On multiple run I get results varying between 12-14s, very little difference between atomicArc vs arc. |
For the sake of debate I will take the counter position to this statement (also posted this on the Nim forum). With it appearing atomic ARC has negligible performance impact, I currently see atomic ARC as a better option vs current ARC. It calls in to question if non-atomic ARC should even exist? With this change, ref objects behave much more like the behavior of other very popular langs like Swift, C#, Go, Java etc. This mental model compatibility seems very valuable to me. Yes you still need locks to mutate the state, we're all very used to that from other languages, it's fine and a separate concern. Independent graphs do not capture all shared heap use-cases so it seems like a pretty narrow focus. Without atomic ARC, ref objects can basically never be used in the same ways with threads as one would expect from other languages. With atomic ARC, even if there is a performance cost in some cases, the programmer can overcome that cost with some effort. Basically, in one direction there is an impossible hurdle, in the other there is a very manageable hurdle that often does not matter or even exist at all. I do not see clearly how ORC fits into this work though, since while the RC would be atomic, it would seem to me that ORC would need a global lock for adding and checking for cycles. Maybe this is fine too? Considering once again a programmer can just add {.acyclic.} to avoid any cost when it is a known non-issue. An atomic ARC but non-atomic ORC would be a strange mix. |
Does this affect single-threaded code ? |
Yes for |
Recent languages have struggled with how to deal safely with shared ownership. I think it's important to distinguish between Furthermore, recent languages with the hindsight of C/C++ mutability issues have strongly discouraged shared ownership and encouraged producer-consumer patterns and explicit transfer of ownership via channels (Go / Rust). Making all ref atomicRef implicitly is a step in the wrong direction for me and I'd strongly prefer atomicRef to be explicit. |
Sure this is a great idea, however in practice one can observe that locks on a shared reference type is still how things end up being done and chosen in spite of this encouragement. Database connection pools are an example that comes to mind right away, just to provide a concrete example: Redis: https://github.com/redis/go-redis/blob/master/internal/pool/pool.go#L244 or https://github.com/gomodule/redigo/blob/master/redis/pool.go#L198 Postgres: https://github.com/jackc/pgx/blob/master/pgxpool/pool.go#L489 + https://github.com/jackc/puddle/blob/master/pool.go#L344 Rust seems to have turned this into "Rocket" science https://github.com/SergioBenitez/Rocket/blob/master/contrib/sync_db_pools/lib/src/lib.rs Not sure if anyone really uses this though? There is also https://github.com/sfackler/r2d2/blob/master/src/lib.rs#L418 (Mutex). Now perhaps I am finding the lock solutions because I'm asking a lock problem, however there is clearly a need for either atomicRef or an explicit thing as suggested. So, past that: My issue with an explicit thing is we've now created a whole new class of thing that is different from other things and the only reason is either 1) for performance which is a measurable trade-off, maybe a non-issue, and avoidable with --threads:off, or 2) to require programmers to do things a way you want them to when it is provably not chosen when alternatives are available. I am not so sure I know the answer for how to build things for everyone, so I suggest keeping things simple and avoiding further combinatorial multiplier factors (oh that is an aref which is special vs other refs so you can't put refs on an aref but you can put an aref on a ref, we all know this problem). It is also the case that an aref would have all the issues of a ref in atomicArc so it really does only serve as a flag, coming with all the cost for such a small benefit. Thanks for letting me add my pov and for sharing yours. |
further stressing this point, shared ownership is a problem in all contexts - what happens is that the problems get exacerbated wildly with any out-of-order execution: this includes
+1 - even C++ has moved on to
Incidentally, this also solves contention and a host of other multithreading / atomics-related performance issues.
In these exceptional cases where low-level control is desired, maybe it's not Keep in mind that manually managed locks already are prone to all the resource management issues that manual memory allocation exhibits, and more: you have to deal with taking locks, releasing them, doing so in correct order to avoid deadlocks etc - calling |
Avoiding any debate on the points you've made since it won't really affect things, I'd like to be concrete about where I am coming from. I have recently written a bunch of open source threaded Nim code and it devolves to ptr + manual memory management basically right away. I get that people want the unique ownership ideal but it doesn't work at all today and doesn't work for things that I don't think are exceptional cases. It seems like a Procrustean bed to me as someone productively using threads in production right now. |
There are very few resources that map naturally to atomic ref in my experience, databases and handles to GPUs. For other cases where sharing is desirable:
We want explicit to avoid sweeping under-the-rug a very complex case of problems that have plagued developers for decades. And for quality software, we want to direct security auditors to "grep The mindset to review correctness of shared primitives is completely different from single-threaded primitives. It's very important to have an explicit keyword to trigger that change of mindset.
Currently there is at least one missing abstraction layer in Nim above
The reason why that abstraction layer is missing is that it's an extremely hard and tangled problem, which also might suffer from no-one-size-fits all:
I have made very extensive research in how new languages with extensive hindsight (Kotlin, Swift, C++20) on the "old new" (Go, Rust), new directions (Pony), proven models (Lua, Erlang) and also academic languages (ML-family with CPS) but all had to make tradeoffs in performance, usability, workloads supported naturally, ... |
This is not really the right place for the discussion but ok, here are my thoughts:
|
Merging it to keep it from bitrotting but it won't become official anytime soon. |
Thanks for your hard work on this PR! Hint: mm: orc; opt: speed; options: -d:release |
ref https://forum.nim-lang.org/t/10161
booting: (Intel 12th i7)
6.374s (--mm:arc --threads:on)
7.107s (--mm:atomicArc --threads:on)