Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds an experimental mm:atomicArc switch #21798

Merged
merged 1 commit into from
May 8, 2023
Merged

adds an experimental mm:atomicArc switch #21798

merged 1 commit into from
May 8, 2023

Conversation

ringabout
Copy link
Member

@ringabout ringabout commented May 6, 2023

ref https://forum.nim-lang.org/t/10161

booting: (Intel 12th i7)
6.374s (--mm:arc --threads:on)
7.107s (--mm:atomicArc --threads:on)

@arnetheduck
Copy link
Contributor

This kind of kicks the can down the road: you still have all shared mutable ownership problems, just at a different level of the code, while all code pays a price.

For threading, the ability to express unique ownership is the constraint that makes passing data between threads easy: you guarantee at all levels that nobody else is accessing the data..

@Araq
Copy link
Member

Araq commented May 6, 2023

On my Ryzen CPU I get these numbers for bootstrapping:

ARC: 165494 lines; 7.864s; 489.047MiB peakmem;
Atomic ARC: 165494 lines; 7.989s; 489.059MiB peakmem;

Likewise for the "binary trees" benchmark but that benchmark is flawed for this comparison so there is no need to report the numbers.

On OSX M1:

atomicArc: 167027 lines; 5.773s; 490.902MiB peakmem;
arc: 167027 lines; 6.070s; 491.109MiB peakmem;

@jmgomez
Copy link
Collaborator

jmgomez commented May 6, 2023

M1Max
--mm:arc --threads:on
6.042s;
--mm: atomicArc --threads:on
6.030s;

Threadripper 24
--mm:arc --threads:on
18.247s;
--mm: atomicArc --threads:on
20.056s

@Clonkk
Copy link
Contributor

Clonkk commented May 6, 2023

Benchmark on Ubuntu 23.04, Intel i5 6th gen 1.6GHz - 4 core 8 threads.

./koch boot -d:release --threads:on --mm:arc

Hint: mm: arc; threads: on; opt: speed; options: -d:release
166803 lines; 14.771s; 490.719MiB peakmem;

./koch boot -d:release --threads:on --mm:atomicArc

int: mm: atomicArc; threads: on; opt: speed; options: -d:release
166803 lines; 14.022s; 490.762MiB peakmem;

On multiple run I get results varying between 12-14s, very little difference between atomicArc vs arc.

@guzba
Copy link
Contributor

guzba commented May 7, 2023

This kind of kicks the can down the road: you still have all shared mutable ownership problems, just at a different level of the code, while all code pays a price.

For the sake of debate I will take the counter position to this statement (also posted this on the Nim forum).

With it appearing atomic ARC has negligible performance impact, I currently see atomic ARC as a better option vs current ARC. It calls in to question if non-atomic ARC should even exist?

With this change, ref objects behave much more like the behavior of other very popular langs like Swift, C#, Go, Java etc. This mental model compatibility seems very valuable to me.

Yes you still need locks to mutate the state, we're all very used to that from other languages, it's fine and a separate concern. Independent graphs do not capture all shared heap use-cases so it seems like a pretty narrow focus.

Without atomic ARC, ref objects can basically never be used in the same ways with threads as one would expect from other languages. With atomic ARC, even if there is a performance cost in some cases, the programmer can overcome that cost with some effort. Basically, in one direction there is an impossible hurdle, in the other there is a very manageable hurdle that often does not matter or even exist at all.

I do not see clearly how ORC fits into this work though, since while the RC would be atomic, it would seem to me that ORC would need a global lock for adding and checking for cycles. Maybe this is fine too? Considering once again a programmer can just add {.acyclic.} to avoid any cost when it is a known non-issue. An atomic ARC but non-atomic ORC would be a strange mix.

@AmjadHD
Copy link
Contributor

AmjadHD commented May 7, 2023

Does this affect single-threaded code ?

@Araq
Copy link
Member

Araq commented May 7, 2023

Does this affect single-threaded code ?

Yes for --threads:on, no for --threads:off.

@mratsim
Copy link
Collaborator

mratsim commented May 7, 2023

Recent languages have struggled with how to deal safely with shared ownership.

I think it's important to distinguish between ref object and atomicRef object so that objects with shared ownership can be easily spotted and audited and reviewed.

Furthermore, recent languages with the hindsight of C/C++ mutability issues have strongly discouraged shared ownership and encouraged producer-consumer patterns and explicit transfer of ownership via channels (Go / Rust).

Making all ref atomicRef implicitly is a step in the wrong direction for me and I'd strongly prefer atomicRef to be explicit.

@guzba
Copy link
Contributor

guzba commented May 7, 2023

encouraged producer-consumer patterns and explicit transfer of ownership via channels (Go / Rust)

Sure this is a great idea, however in practice one can observe that locks on a shared reference type is still how things end up being done and chosen in spite of this encouragement.

Database connection pools are an example that comes to mind right away, just to provide a concrete example:

Redis: https://github.com/redis/go-redis/blob/master/internal/pool/pool.go#L244 or https://github.com/gomodule/redigo/blob/master/redis/pool.go#L198

Postgres: https://github.com/jackc/pgx/blob/master/pgxpool/pool.go#L489 + https://github.com/jackc/puddle/blob/master/pool.go#L344

Rust seems to have turned this into "Rocket" science https://github.com/SergioBenitez/Rocket/blob/master/contrib/sync_db_pools/lib/src/lib.rs Not sure if anyone really uses this though? There is also https://github.com/sfackler/r2d2/blob/master/src/lib.rs#L418 (Mutex).

Now perhaps I am finding the lock solutions because I'm asking a lock problem, however there is clearly a need for either atomicRef or an explicit thing as suggested. So, past that:

My issue with an explicit thing is we've now created a whole new class of thing that is different from other things and the only reason is either 1) for performance which is a measurable trade-off, maybe a non-issue, and avoidable with --threads:off, or 2) to require programmers to do things a way you want them to when it is provably not chosen when alternatives are available.

I am not so sure I know the answer for how to build things for everyone, so I suggest keeping things simple and avoiding further combinatorial multiplier factors (oh that is an aref which is special vs other refs so you can't put refs on an aref but you can put an aref on a ref, we all know this problem). It is also the case that an aref would have all the issues of a ref in atomicArc so it really does only serve as a flag, coming with all the cost for such a small benefit.

Thanks for letting me add my pov and for sharing yours.

@arnetheduck
Copy link
Contributor

arnetheduck commented May 7, 2023

Recent languages have struggled with how to deal safely with shared ownership.

further stressing this point, shared ownership is a problem in all contexts - what happens is that the problems get exacerbated wildly with any out-of-order execution: this includes async and other thread-like constructs. Nim currently is lacking in the single ownership area in general, it is not a problem limited to threads (ergo any solution to this problem should hopefully apply to both threads and not, since it is essentially the exact same problem).

... strongly discouraged shared ownership and encouraged producer-consumer patterns ... Making all ref atomicRef implicitly is a step in the wrong direction

+1 - even C++ has moved on to unique_ptr as it's main ref-like construct.

producer-consumer

Incidentally, this also solves contention and a host of other multithreading / atomics-related performance issues.

shared reference type is still how things end up being done

In these exceptional cases where low-level control is desired, maybe it's not ref you're looking for? create can be used to create perfectly fine user-managed ptr instances that don't have the ref problems. Different problem, different primitive. In fact, it would be nice if one could tag a type explicitly as not containing any GC at all - these are always trivially transferable between threads with "classic" (aka error-prone) techniques like locks.

Keep in mind that manually managed locks already are prone to all the resource management issues that manual memory allocation exhibits, and more: you have to deal with taking locks, releasing them, doing so in correct order to avoid deadlocks etc - calling dealloc at the end of the operation is the least of your concerns if your desire for hard-to-debug-code is this high.

@guzba
Copy link
Contributor

guzba commented May 7, 2023

Avoiding any debate on the points you've made since it won't really affect things, I'd like to be concrete about where I am coming from.

I have recently written a bunch of open source threaded Nim code and it devolves to ptr + manual memory management basically right away. ref object is unusable. Is this truly the best Nim can do?

I get that people want the unique ownership ideal but it doesn't work at all today and doesn't work for things that I don't think are exceptional cases. It seems like a Procrustean bed to me as someone productively using threads in production right now.

@mratsim
Copy link
Collaborator

mratsim commented May 8, 2023

Database connection pools are an example that comes to mind right away, just to provide a concrete example:

There are very few resources that map naturally to atomic ref in my experience, databases and handles to GPUs.
For those resources, atomic ref overhead is negligeable compared to the overhead of DB IO and GPU memcpy.

For other cases where sharing is desirable:

  • either fully shared memory with algorithmic guarantees that no data race can occur (parallel-for as in most scientific computing) is preferred
  • or the producer-consumer pattern is preferred (microservices / unix pipes-like division of work)

My issue with an explicit thing is we've now created a whole new class of thing that is different from other things and the only reason is either 1) for performance which is a measurable trade-off, maybe a non-issue, and avoidable with --threads:off, or 2) to require programmers to do things a way you want them to when it is provably not chosen when alternatives are available.

We want explicit to avoid sweeping under-the-rug a very complex case of problems that have plagued developers for decades. And for quality software, we want to direct security auditors to "grep type Foo = atomicRef object and pay special attention to how they are used."

The mindset to review correctness of shared primitives is completely different from single-threaded primitives. It's very important to have an explicit keyword to trigger that change of mindset.

I have recently written a bunch of open source threaded Nim code and it devolves to ptr + manual memory management basically right away. ref object is unusable. Is this truly the best Nim can do?

Currently there is at least one missing abstraction layer in Nim above createThread:

The reason why that abstraction layer is missing is that it's an extremely hard and tangled problem, which also might suffer from no-one-size-fits all:

  • Nim default memory management was (and is still) thread-local. There was very few incentives to change the status quo:
    • Newbie devs had plenty on their plate to learn Nim
    • The very few companies needed a product to ship first before engaging in something more experimental
    • Which only left projects that grew in the room allotted by Nim stdlib and now are constrained by it to try to improve the situation. With many uncertainties both on the technical details and delivery time.
  • It's strongly tied to async:
    • async uses ref objects everywhere and ref objects are a non-starter for multithreading
    • the async dispatcher is a strong dependency and a hard problem
    • coloring issues (we can argue that ref vs atomic ref is also coloring)
  • It's strongly tied to closure iterators and the concept of resumable functions. Closure iterators had some of the nastiest bugs, with runtime-only bugs, in exceptional paths (try/finally or defer) finally block doesn't execute in async proc #19911 and there had been experiments like Continuation-Passing Style (CPS) to bring transformations that have been more formally reviewed and come with proofs of correctness in the litterature
  • It's strongly tied to handling exceptions and exceptions use ref objects.

I have made very extensive research in how new languages with extensive hindsight (Kotlin, Swift, C++20) on the "old new" (Go, Rust), new directions (Pony), proven models (Lua, Erlang) and also academic languages (ML-family with CPS) but all had to make tradeoffs in performance, usability, workloads supported naturally, ...

@Araq
Copy link
Member

Araq commented May 8, 2023

This is not really the right place for the discussion but ok, here are my thoughts:

  1. We need both:
    • The ability to use lock effectively on a ref. I still wonder why people struggle with this, seems really easy if you know the disarm destructor trick. Note that locking can be done outside of a library in the caller and make code threadsafe that previously wasn't. This is a tremendous advantage as it allows one to use 3rd party code easily and this 3rd party code does not have to make a choice about ref vs atomic ref.
    • isolate for effective message passing of isolated graphs. We have the mechanism for it, but fight with the details of the spec.
  2. Neither solution needs atomicArc and indeed it seems to be a step in the wrong direction to me too. However, maybe your CPU doesn't mind the overhead and it feels a bit strange not to allow for this mode of operation for the people who know what they are doing. It can also be seen as a valuable debug tool ("would the sanitizer be happier with atomic instructions? let's see with this simple switch")
  3. We really need to investigate why --threads:on is so expensive! Is the thread local storage still slow? Is the new shiny shared allocator to blame?

@Araq
Copy link
Member

Araq commented May 8, 2023

Merging it to keep it from bitrotting but it won't become official anytime soon.

@Araq Araq merged commit 4533e89 into devel May 8, 2023
@Araq Araq deleted the pr_atomic_arc branch May 8, 2023 14:25
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from 4533e89

Hint: mm: orc; opt: speed; options: -d:release
167104 lines; 8.955s; 613.176MiB peakmem

capocasa pushed a commit to capocasa/Nim that referenced this pull request May 15, 2023
capocasa pushed a commit to capocasa/Nim that referenced this pull request May 16, 2023
bung87 pushed a commit to bung87/Nim that referenced this pull request Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants