New channels implementation for ORC #17305

ringabout · 2021-03-09T05:53:24Z

The new implementation is mostly based on https://github.com/mratsim/weave/blob/5696d94e6358711e840f8c0b7c684fcc5cbd4472/unused/channels/channels_legacy.nim.

So I keep the license and the link.

Related article:
nim-lang/website#274

lib/std/channel.nim

lib/std/channels.nim

timotheecour · 2021-03-12T02:29:27Z

lib/std/channels.nim

+
+func peek*[T](c: Channel[T]): int {.inline.} = peek(c.d)
+
+proc newChannel*[T](elements = 30): Channel[T] =


document that elements = 1 will cause channel to be unbuffered

magic number 30 sounds weird; how about making the default unbuffered instead (and using an impossible value, say -1, to denote that the channel is unbufferred)

isn't typedesc param more common now-days? [T] is mostly useful for implicit generic instantiation

timotheecour · 2021-03-12T04:04:41Z

/cc @mratsim
why are we using something like a BoundedQueue (refs https://github.com/mratsim/weave/blob/2e528bd2a6a04306fd029f04a0af87fa201e5d89/weave/datatypes/bounded_queues.nim#L42) instead of a std/deques?

with current API, you'd need to know ahead of time the capacity to use, which is in practice really hard: we have an impossible dilemma to solve:

either use a capacity that's too large (over-provisions, wastes memory)
or use a capacity that's too small (under-provisions, causes wait's when buffer fills up, affecting performance)

with std/queues, you avoid this impossible dilemma and let the buffer grow as needed. We could improve this by adding a maximum cap on the deques underlying buffer but that's a minor addition to an existing module.

Furthermore, even if somehow a BoundedQueue would be better, IMO that part could be factored out and reused, it'd be useful not just in the context of std/channels

timotheecour

LGTM to make progress, with no expectation of API stability at this point.

I'm really curious about whether we should instead use std/dequeues, and a few other points, but can be addressed in followup work

Araq · 2021-03-12T07:38:09Z

with std/queues, you avoid this impossible dilemma and let the buffer grow as needed. We could improve this by adding a maximum cap on the deques underlying buffer but that's a minor addition to an existing module.

In my experience this is the far better design than unbounded queues, these tend to consume way too much memory and you lose the flow control aspect of bounded queues. Golang also uses bounded queues as far as I know and never revisited this design decision.

mratsim · 2021-03-12T09:14:29Z

/cc @mratsim
why are we using something like a BoundedQueue (refs https://github.com/mratsim/weave/blob/2e528bd2a6a04306fd029f04a0af87fa201e5d89/weave/datatypes/bounded_queues.nim#L42) instead of a std/deques?

with current API, you'd need to know ahead of time the capacity to use, which is in practice really hard: we have an impossible dilemma to solve:
* either use a capacity that's too large (over-provisions, wastes memory)

* or use a capacity that's too small (under-provisions, causes wait's when buffer fills up, affecting performance)
with std/queues, you avoid this impossible dilemma and let the buffer grow as needed. We could improve this by adding a maximum cap on the deques underlying buffer but that's a minor addition to an existing module.

Furthermore, even if somehow a BoundedQueue would be better, IMO that part could be factored out and reused, it'd be useful not just in the context of std/channels

There are multiple reasons:

Very often, either the producer is much faster than the consumer or the consumer is much faster than the producer.
There are 3 ways to handle the first case at the channel level, buffering, blocking or dropping.
- Buffering is the unbounded queue solution you propose, remember that ultimately they are bounded by your memory
- Dropping would be a ring buffer with the oldest message being overwritten
- Blocking is the bounded queue
Additionally, the producer can throttle itself and adapt it's rate of message, which assumes it has some feedback about the consumer consumption speed.

I think we can all agree that dropping is too niche for a standard library solution which leaves discussing buffering or blocking.

Buffering is an out-of-memory waiting to happen, with no recourse for the producer to detect if it's too fast. People need to think about their buffer size in a multithreading scenario and bounded queues are needed for backpressure management. This allows producers to adapt and potentially throttle themselves in case the consumer is overloaded. Usually, the messages are coming from IO (say networking), and passed via channels to a compute thread, blocking channels are the way for compute thread to communicate that it's overloaded and that messages cannot be handled so an alternative should be found (maybe throttle the source, maybe deactivate non-essential services/computations, maybe ban it because it's a DOS, ...).

On the other hand, you can easily emulate an unbounded channel from a bounded channel by using trySend and if it blocks, putting the data in your own buffer in front of the channel. It's also quite possible that the buffering shouldn't happen in the channel or just in front of it but more upstream in the "offending" producer because multiple-producer can enqueue messages in the channel and only one can misbehave, or some should have higher priorities than other in case of congestion.
You can't use the 2 locks solution with unbounded channels because on reaching the bound, the producer must lock the consumer out while resizing. Hence you get higher lock contention and worse performance.
When you need unbounded channels, usually you use list-based channels.

References on backpressure:

timotheecour · 2021-03-12T09:25:38Z

you're comparing BoundedQueue vs unbounded std/dequeues, but I was referring to std/dequeues with a max capacity beyond which you'd fallback to essentially the same as a BoundedQueue.

the initial size parameter becomes irrelevant, and doesn't even need to be a user parameter, could be set to a fixed small value (say 0) as initial queue capacity
the capacity of the dequeue keeps doubling (as in deques) as needed until a max capacity is reached
then it falls back to the BoundedQueue behavior

This gives best of both worlds:

no risk of over or under provision
no unbounded memory use, in fact same memory use as BoundedQueue in the worst case; the max capacity parameter (the only parameter) is not very sensitive since if you overprovision it, you won't pay the memory cost unless you actually need it (eg: producer faster than consumer)
as future refinement, it could also shrink using standard algorithm (2 hysteresis thresholds) which can be useful to dynamically adapt while maintaining resource usage proportional to current needs in a long running scenario

Araq · 2021-03-12T09:43:06Z

We don't have an implementation for your "best of both worlds" though and it's not clear how it would perform.

mratsim · 2021-03-12T09:46:01Z

std/dequeues does over-provision because it requires power of 2 size, so if you use 33 as your capacity, you will allocate for 64 slots and pay for that cost. The current channels proposed allocate exactly what the user wants and provide excellent performance and overhead characteristics.

Now std/dequeues implementation can be improved to use the current PR backend implementation but that's for a separate PR I think.

For specialized needs, the producer can compose the base channel with a buffer in front that provides the extra leeway needed, they can choose a small channel and implement a buffer with the hysteresis strategy you mention as ans example.

In general, all the dynamism and complex logic should be done outside of the base channel.
Alternatively they might need to write their own channels, but it's easier to swap a simple channel and reuse their additional logic than reimplement all of what they did on some channels.

I've mentioned it in the channels API RFC PR nim-lang/RFCs#347 (comment)
There is no "2 worlds" regarding channels, we should provide a good default and compose well because it is impossible to write a channel that satisfies all the potential expectations of a channel:

single-producer or multi-producer
single-consumer or multi-consumer
list-based or array-based or hybrid
- If list-based:
  - intrusive linked list or non-intrusive
bounded or unbounded
- if bounded:
  - Errors on reaching bounds
  - Blocks on reaching bounds
  - Overwrite the oldest item on reaching bounds (ring-buffer)
lock-based or atomic-based
progress guarantees:
- no guarantees (whether using locks or atomics)
- obstruction-free
- lockfree
- waitfree
priorities support
item ordering guarantees
- no guarantees
- serializability
- linearizability
Buffered or unbuffered (rendez-vous)

Araq · 2021-03-12T09:57:54Z

is there anything preventing std/channels from being usable with gc:refc (not necessarily now but in future work) ?
at least a subset of the API or for a subset of the types being sent/received

I think there is no good way to make these work with the old GCs. We will eventually make --gc:orc the default for a reason. ;-)

if so then API's in system/channels could be eventually deprecated and removed

That's true.

timotheecour · 2021-03-12T10:00:18Z

std/dequeues does over-provision because it requires power of 2 size, so if you use 33 as your capacity, you will allocate for 64 slots and pay for that cost

it doesn't over-provision, you do as follows:
given user provided max capacity C (C = 33 in your case):
you double capacity as needed (starting from 1) until it reaches the max power of 2 <= C (ie, 32 in your case), and when this fills up, you increase the length 1 final time to C ( = 33) and switch to BoundedQueue regime.

so yes, you do get best of both worlds: no overprovisioning and pay-what-you-actually-use, instead of a max capacity.

performance wise, it's at least as good as BoundedQueue.

Araq · 2021-03-12T12:32:49Z

Merging for now, we can refine it later.

* Update lib/std/channels.nim * Rename tchannel_pthread.nim to tchannels_pthread.nim * Rename tchannel_simple.nim to tchannels_simple.nim Co-authored-by: Mamy Ratsimbazafy <[email protected]>

ringabout added 10 commits March 8, 2021 20:28

improve test coverage for isolation

3273fd7

a bit better

b4fa432

Merge remote-tracking branch 'upstream/devel' into a141

421cf08

prepare Nim channel implementation for ORC

4124ddf

add testcase

2bd38ae

more

13c03bb

chan

34ac3d5

more

26262ef

arc matrix

ffefb11

use more result

7076286

ringabout marked this pull request as ready for review March 9, 2021 08:06

ringabout requested a review from timotheecour March 9, 2021 08:07

ringabout added 2 commits March 9, 2021 16:15

use orc

75161fb

fix

db8b10d