refactor channels.trySend/tryRecv and improve tests #74

planetis-m · 2024-08-13T10:51:00Z

This PR refactors the logic for the non-blocking channels as well as the relevant test.

Forces trySend calls in tchannels_singlebuf.nim to run multiple times before succeeding.
Changes the signature to proc trySend*[T](c: Chan[T], src: var Isolated[T]): bool as discussed in the comments.
Reworks the glue logic between Chan and ChannelRaw in send/trySend to prevent copies (use Isolated[T] directly).
Add a warning message to the trySend template to inform the user about unwanted copies.
Use tryAcquire instead of acquire for the non-blocking variants. A warning about increased lock-contention is added.
Removes warnings messages from the trySend/tryRecv docs about blocking that are incorrect (see comments).
recvIso is also made to use Isolated[T] directly as this fixes receiving ref objects.
remove useless assert in newChan, since this condition is covered by the Positive range type.

planetis-m · 2024-08-13T20:55:30Z

Currently trySend does not follow the API that was introduced in nim-lang/RFCs#347 which produces different code as evident by the tchannels_singlebuf test.

proc trySend*[T](c: Chan[T], src: sink Isolated[T]): bool

template trySend*[T](c: Chan[T], src: T): bool =
  trySend(c, isolate(src))

# --expandarc:trySend
var data
data = extract(src)
result = channelSend(c.d, addr(data), 16, 0)
if result:
  wasMoved(data)
`=destroy`(src)
`=destroy`(data)

# --expandarc:test
var msg_cursor
var notSent = true
msg_cursor = "Hello"
block :tmp:
  while notSent:
    var :tmpD
    notSent = not
      mixin isolate
      trySend(chan, isolate do:
        :tmpD = `=dup`(msg_cursor)
        :tmpD)
    if notSent:
      atomicInc(attempts, 1)

While the following modification generates:

proc trySend*[T](c: Chan[T], src: var Isolated[T]): bool

template trySend*[T](c: Chan[T], src: T): bool =
  mixin isolate
  var p = isolate(src)
  trySend(c, p)

# --expandarc:trySend
var data
data = extract(src)
result = channelSend(c.d, addr(data), 16, 0)
if result:
  wasMoved(data)
`=destroy`(data)

# --expandarc:test
var msg_cursor
var notSent = true
msg_cursor = "Hello"
block :tmp:
  while notSent:
    var
      p`gensym0
      :tmpD
    notSent = not
      mixin isolate
      p`gensym0 = isolate do:
        :tmpD = `=dup`(msg_cursor)
        :tmpD
      trySend(chan, p`gensym0)
    if notSent:
      atomicInc(attempts, 1)
    `=destroy`(p`gensym0)

planetis-m · 2024-08-13T21:40:26Z

@Araq the changes for now seem correct. Fwiw the trySend template should be avoided in favor of:

  proc test(chan: Chan[string]) {.thread.} =
    var notSent = true
    var msg = Message
    var p = isolate(msg)
    while notSent:
      let notSent = not chan.trySend(p)
      if notSent:
        atomicInc(attempts)

Done.

ZoomRmc · 2024-08-14T16:02:46Z

This PR makes a significant change to the channel implementation with adding the tryAcquire to the non-blocking routines and definitely needs to not be piggybacked on what looks like test refactoring on the surface.

planetis-m · 2024-08-14T17:21:33Z

@ZoomRmc it fixes a major issue with the non-blocking variant namely:

Blocking is still possible if another thread uses the blocking version of the send proc_ / recv proc_ and waits for the
data/space to appear in the channel, thus holding the internal lock to channel's buffer.

at least for me, it makes no sense to keep this limitation as the function's contract. Can you name a reason, I may be missing?

planetis-m · 2024-08-14T17:30:46Z

Now that I re-read the warning, I believe it's incorrect to begin with as waiting is done only by the blocking send/recv calls and that usually causes the lock to be released automatically by the wait function. So the tryAcquire change if anything, I would assume, reduces lock contention caused by the trySend/tryRecv calls when the channel is full/empty.

planetis-m · 2024-08-14T17:37:31Z

Here's what claude 3.5 thinks about this change, after correcting it (it assumed that trySend/tryRecv used the condition variables before) https://claude.site/artifacts/fec48ef8-43da-4d49-949b-f3c94c305d21

ZoomRmc · 2024-08-14T17:48:16Z

Sorry, LLMs are useful for rubberducking but they are laughably bad at reasoning about concurrency unless driven to consider all the edge cases by the user, at which point it most probably means user already has gathered the mental model in their head and there's no point in providing LLM's output.

At the very least, not blocking on lock acquiring means you're probably trading one possible failure mode to another: deadlock to an infinite loop.

I can't think of anything being wrong with the change off the cuff, but my gut tells me it needs careful consideration and discussion.

Even if it's all sound, this is a significant change that needs to be thought through for multiple possible combinations of [try]send/[try]receive and numbers of senders and receivers. It would be proper to have the expected logic laid out in the PR description.

planetis-m · 2024-08-14T18:07:05Z

@ZoomRmc alright, I will run a benchmark latter tonight to counter your criticism, and I would have a better impression of this change. However I am not sure if I can test every variation that you require unless you can propose some existing benchmark.

It would be proper to have the expected logic

I will update the PR's description, however I am curious if you can point me to another PR where your reasoning is explained in detail, if anything I am interested in learning.

Cheer's!

planetis-m · 2024-08-14T18:09:28Z

LLMs are useful for rubberducking but they are laughably bad at reasoning.

Can you be more explicit on which part(s) of the artifact you disagree with, just so we're in the same page.

ZoomRmc · 2024-08-14T18:32:13Z

Can you be more explicit on which part(s) of the artifact you disagree with, just so we're in the same page.

I'm just against using LLM's output as supporting evidence in support of a point in a hard context in general. They are designed to produce extremely convincing and at the same time satisfying answers, just as any eager person with a huge confirmation bias.

I will update the PR's description, however I am curious if you can point me to another PR where your reasoning is explained in detail, if anything I am interested in learning.

Unfortunately, I don't think I really can. There's a few questions regarding the correctness of the implementation that I posed at different times (such as this "BTW" part). I also commented on a couple of issues going through some parts of the logic.

However, I'm neither the author nor the expert. The current implementation mostly follows the original by @aprell and relies on his thesis being correct plus gradual fixing of discovered bugs and accompanying discussions happening in this repo.

I don't really have any criticism at the moment, as I'm at a different headspace. (I might have some later, though :D)

I'm just warning against making changes to the logic in a "test" PR so it doesn't fly under the radars of anyone who's interested in the module's implementation. In my opinion this requires splitting the PR and, perhaps, grants its own Issue.

planetis-m · 2024-08-14T20:39:47Z

Another part of the PR that changed in accordance to RFC#347 is the removal of the extract logic:

# Current implementation

# a sink parameter in loops creates copies
proc trySend*[T](c: Chan[T], src: sink Isolated[T]): bool {.inline.} =
  var data = src.extract # this is basically (bitwiseCopy data, src.value; wasMoved(src.value))
  result = channelSend(c.d, data.addr, sizeof(T), false)
  if result:
    wasMoved(data)
  # data need to be destroyed

# Proposed changes
proc trySend*[T](c: Chan[T], src: var Isolated[T]): bool {.inline.} =
  result = channelSend(c.d, src.addr, sizeof(T), false) # no extract call
  if result:
    wasMoved(src)
  # no destructor call here as src is a var parameter

which I also posted on discord for feedback, and I am reposting here so it doesn't get lost. Imo it's correct, as my spsc channel implementation in planetis/sync went through similar changes.

planetis-m · 2024-08-14T20:58:47Z

@ZoomRmc from your comment in #27:

BTW, what happens if there's somehow more procs blocked on a signal then procs that will manage to make successfully send them and return? Probably a deadlock, since we never broadcast, only signal. @ringabout, since you've written this impl, did you think about this situation?

This is definitely a valid concern for using broadcast, or we might even need to eventually rethink how channels is implemented and reuse similar logic as in https://github.com/planetis-m/sync/blob/f0174325d4c66e28475781bc0a436e7ab245acaf/tests/tsemaphore.nim

The current implementation mostly follows the original by @aprell and relies on his thesis being correct plus gradual fixing of discovered bugs and accompanying discussions happening in this repo.

I see that trySend/tryRecv definitely follows the logic from that repo, but send/recv are not in there. And a big change is that there're TWO locks in that design but here's only one! So it's definitely changed significantly, so I don't see your point about a thesis from another project being correct.

This is definitely something I need to change in a follow up PR.

planetis-m · 2024-08-14T22:24:28Z

Alright benchmarks results:

Benchmark code

import threading/channels, std/os, times, stats, cpuinfo, atomics

type
  Message = array[10, int]

const
  NumIterations = 1_000_000
  ChannelSize = 100

type
  ThreadArg = object
    chan: Chan[Message]
    iterations: int
    successCount: ptr Atomic[int]
    failureCount: ptr Atomic[int]

proc producerThread(arg: ThreadArg) {.thread.} =
  var msg: Message
  for i in 1..arg.iterations:
    msg[0] = i  # Just to have some changing data
    if arg.chan.trySend(msg):
      discard arg.successCount[].fetchAdd(1)
    else:
      discard arg.failureCount[].fetchAdd(1)
    cpuRelax()

proc consumerThread(arg: ThreadArg) {.thread.} =
  var msg: Message
  for i in 1..arg.iterations:
    if arg.chan.tryRecv(msg):
      discard arg.successCount[].fetchAdd(1)
    else:
      discard arg.failureCount[].fetchAdd(1)
    cpuRelax()

proc runBenchmark(): (float, int, int, int, int) =
  let numThreads = countProcessors()
  let numProducers = numThreads div 2
  let numConsumers = numThreads - numProducers

  var chan = newChan[Message](elements = ChannelSize)
  var producers: seq[Thread[ThreadArg]]
  var consumers: seq[Thread[ThreadArg]]
  producers.setLen(numProducers)
  consumers.setLen(numConsumers)

  var sendSuccessCount, sendFailureCount, recvSuccessCount, recvFailureCount: Atomic[int]
  sendSuccessCount.store(0)
  sendFailureCount.store(0)
  recvSuccessCount.store(0)
  recvFailureCount.store(0)

  let iterationsPerThread = NumIterations div numThreads

  let producerArg = ThreadArg(
    chan: chan,
    iterations: iterationsPerThread,
    successCount: addr sendSuccessCount,
    failureCount: addr sendFailureCount
  )

  let consumerArg = ThreadArg(
    chan: chan,
    iterations: iterationsPerThread,
    successCount: addr recvSuccessCount,
    failureCount: addr recvFailureCount
  )

  let start = cpuTime()

  # Start producer threads
  for i in 0 ..< numProducers:
    createThread(producers[i], producerThread, producerArg)

  # Start consumer threads
  for i in 0 ..< numConsumers:
    createThread(consumers[i], consumerThread, consumerArg)

  # Wait for all threads to finish
  for thread in producers:
    joinThread(thread)
  for thread in consumers:
    joinThread(thread)

  let duration = cpuTime() - start

  return (duration, sendSuccessCount.load, sendFailureCount.load,
          recvSuccessCount.load, recvFailureCount.load)

when isMainModule:
  var benchmarkTimes: RunningStat
  let numRuns = 5

  let numThreads = countProcessors()
  let numProducers = numThreads div 2
  let numConsumers = numThreads - numProducers

  echo "Running MPMC benchmark using ", numThreads, " threads"
  echo "Producers: ", numProducers, ", Consumers: ", numConsumers
  echo "Channel size: ", ChannelSize
  echo "Message type: array[10, int]"
  echo "Each benchmark will run ", numRuns, " times"

  for i in 1..numRuns:
    echo "\nRun ", i
    let (time, sendSuccess, sendFailure, recvSuccess, recvFailure) = runBenchmark()
    benchmarkTimes.push time
    echo "Time: ", time, " seconds"
    echo "trySend success: ", sendSuccess, ", failure: ", sendFailure
    echo "trySend success rate: ", (sendSuccess.float / (sendSuccess + sendFailure).float * 100), "%"
    echo "tryRecv success: ", recvSuccess, ", failure: ", recvFailure
    echo "tryRecv success rate: ", (recvSuccess.float / (recvSuccess + recvFailure).float * 100), "%"

    sleep(100) # Short pause between runs

  echo "\nMPMC Benchmark Results:"
  echo "Mean time: ", benchmarkTimes.mean, " seconds"
  echo "Standard deviation: ", benchmarkTimes.standardDeviation, " seconds"

Results without the changes

Running MPMC benchmark using 8 threads Producers: 4, Consumers: 4 Channel size: 100 Message type: array[10, int] Each benchmark will run 5 times

Run 1
Time: 0.0003458879999999999 seconds
trySend success: 185543, failure: 314457
trySend success rate: 37.1086%
tryRecv success: 185543, failure: 314457
tryRecv success rate: 37.1086%

Run 2
Time: 0.0002905350000000002 seconds
trySend success: 200945, failure: 299055
trySend success rate: 40.189%
tryRecv success: 200945, failure: 299055
tryRecv success rate: 40.189%

Run 3
Time: 0.000301808 seconds
trySend success: 255265, failure: 244735
trySend success rate: 51.053%
tryRecv success: 255265, failure: 244735
tryRecv success rate: 51.053%

Run 4
Time: 0.00032065199999999983 seconds
trySend success: 249849, failure: 250151
trySend success rate: 49.9698%
tryRecv success: 249849, failure: 250151
tryRecv success rate: 49.9698%

Run 5
Time: 0.00029954199999999986 seconds
trySend success: 296773, failure: 203227
trySend success rate: 59.3546%
tryRecv success: 296673, failure: 203327
tryRecv success rate: 59.3346%

MPMC Benchmark Results:
Mean time: 0.00031168499999999997 seconds
Standard deviation: 0.00001971082127157561 seconds

Changing acquire to tryAcquire

Running MPMC benchmark using 8 threads Producers: 4, Consumers: 4 Channel size: 100 Message type: array[10, int] Each benchmark will run 5 times

Run 1
Time: 0.00016166300000000004 seconds
trySend success: 43417, failure: 456583
trySend success rate: 8.6834%
tryRecv success: 43317, failure: 456683
tryRecv success rate: 8.6634%

Run 2
Time: 0.00030895999999999994 seconds
trySend success: 42557, failure: 457443
trySend success rate: 8.5114%
tryRecv success: 42457, failure: 457543
tryRecv success rate: 8.4914%

Run 3
Time: 0.0002811370000000001 seconds
trySend success: 44618, failure: 455382
trySend success rate: 8.9236%
tryRecv success: 44518, failure: 455482
tryRecv success rate: 8.9036%

Run 4
Time: 0.000292769 seconds
trySend success: 42148, failure: 457852
trySend success rate: 8.4296%
tryRecv success: 42048, failure: 457952
tryRecv success rate: 8.4096%

Run 5
Time: 0.00028581599999999993 seconds
trySend success: 43253, failure: 456747
trySend success rate: 8.6506%
tryRecv success: 43153, failure: 456847
tryRecv success rate: 8.6306%

MPMC Benchmark Results:
Mean time: 0.000266069 seconds
Standard deviation: 0.000053047392678622734 seconds

Most noticeable change is the sharp increase in failure rate. Will investigate further tomorrow.

ZoomRmc · 2024-08-14T23:45:42Z

Well, the other benefit of a lock is being able to wait on it, leveraging the OS and not just constantly polling.

Some thoughts to consider:

Why does one need guaranteed non-blocking ops anyway? I think it's not just so you can burn cycles in a loop, but perhaps to be able to further manipulate the task that uses the channel. For example: cancel it or set a timeout limit. However, this module does not provide any means for that, the user needs to implement it either by bolting on some additional synchronization mechanism to the channel or to use a signal sent via the channel itself.

So, what possible cases really suffer from try routines waiting for the lock? The critical section guards just one memory write, couple of atomic int writes and a signal. Moreover, the lock is only held when there's space/data in the channel, otherwise it's released immediately. Looks like it the lock contention can only grow to significant amounts only in situations when the amount of data sent through the channel is disproportionately dominated by the number of active readers/writers (this point requires measurements to support it, though).

I suspect that the scenarios above will be much less common for the use-cases of this module than what your benchmark demonstrates: simply trying in a loop. So, to me it looks like a tradeoff in which using this module as a building block for a more complex abstraction will possibly benefit but the basic use-case will be at some degree of performance disadvantage.

It's late and I'm probably missing something, but I hope I'm not totally off.

planetis-m · 2024-08-15T06:07:42Z

Another benchmark measuring throughput.

Benchmark code

import threading/channels, std/os, times, stats, cpuinfo, atomics, strutils, sequtils

type
  Message = array[10, int]

const
  BenchmarkDuration = 5.0  # seconds
  ChannelSize = 100

type
  ThreadArg = object
    chan: Chan[Message]
    runningFlag: ptr Atomic[bool]
    successCount: ptr Atomic[int]
    failureCount: ptr Atomic[int]

proc producerThread(arg: ThreadArg) {.thread.} =
  var msg: Message
  var i = 0
  while arg.runningFlag[].load(moRelaxed):
    inc i
    msg[0] = i  # Just to have some changing data
    if arg.chan.trySend(msg):
      discard arg.successCount[].fetchAdd(1)
    else:
      discard arg.failureCount[].fetchAdd(1)
    cpuRelax()

proc consumerThread(arg: ThreadArg) {.thread.} =
  var msg: Message
  while arg.runningFlag[].load(moRelaxed):
    if arg.chan.tryRecv(msg):
      discard arg.successCount[].fetchAdd(1)
    else:
      discard arg.failureCount[].fetchAdd(1)
    cpuRelax()

proc runBenchmark(): (float, int, int, int, int) =
  let numThreads = countProcessors()
  let numProducers = numThreads div 2
  let numConsumers = numThreads - numProducers

  var chan = newChan[Message](elements = ChannelSize)
  var producers: seq[Thread[ThreadArg]]
  var consumers: seq[Thread[ThreadArg]]
  producers.setLen(numProducers)
  consumers.setLen(numConsumers)

  var runningFlag: Atomic[bool]
  runningFlag.store(true)

  var sendSuccessCount, sendFailureCount, recvSuccessCount, recvFailureCount: Atomic[int]
  sendSuccessCount.store(0)
  sendFailureCount.store(0)
  recvSuccessCount.store(0)
  recvFailureCount.store(0)

  let producerArg = ThreadArg(
    chan: chan,
    runningFlag: addr runningFlag,
    successCount: addr sendSuccessCount,
    failureCount: addr sendFailureCount
  )

  let consumerArg = ThreadArg(
    chan: chan,
    runningFlag: addr runningFlag,
    successCount: addr recvSuccessCount,
    failureCount: addr recvFailureCount
  )

  # Start producer threads
  for i in 0 ..< numProducers:
    createThread(producers[i], producerThread, producerArg)

  # Start consumer threads
  for i in 0 ..< numConsumers:
    createThread(consumers[i], consumerThread, consumerArg)

  # Run for fixed duration
  sleep(int(BenchmarkDuration * 1000))

  # Stop all threads
  runningFlag.store(false)

  # Wait for all threads to finish
  for thread in producers:
    joinThread(thread)
  for thread in consumers:
    joinThread(thread)

  return (BenchmarkDuration, sendSuccessCount.load, sendFailureCount.load,
          recvSuccessCount.load, recvFailureCount.load)

when isMainModule:
  var sendThroughput, recvThroughput: RunningStat
  let numRuns = 5

  let numThreads = countProcessors()
  let numProducers = numThreads div 2
  let numConsumers = numThreads - numProducers

  echo "Running MPMC throughput benchmark using ", numThreads, " threads"
  echo "Producers: ", numProducers, ", Consumers: ", numConsumers
  echo "Channel size: ", ChannelSize
  echo "Message type: array[10, int]"
  echo "Benchmark duration: ", BenchmarkDuration, " seconds"
  echo "Number of runs: ", numRuns

  for i in 1..numRuns:
    echo "\nRun ", i
    let (duration, sendSuccess, sendFailure, recvSuccess, recvFailure) = runBenchmark()

    let sendTotal = sendSuccess + sendFailure
    let recvTotal = recvSuccess + recvFailure
    let sendThroughputOps = sendTotal.float / duration
    let recvThroughputOps = recvTotal.float / duration

    sendThroughput.push sendThroughputOps
    recvThroughput.push recvThroughputOps

    echo "Send throughput: ", sendThroughputOps.formatFloat(ffDecimal, 2), " ops/s"
    echo "Send success rate: ", (sendSuccess.float / sendTotal.float * 100).formatFloat(ffDecimal, 2), "%"
    echo "Receive throughput: ", recvThroughputOps.formatFloat(ffDecimal, 2), " ops/s"
    echo "Receive success rate: ", (recvSuccess.float / recvTotal.float * 100).formatFloat(ffDecimal, 2), "%"

    sleep(100) # Short pause between runs

  echo "\nMPMC Benchmark Results:"
  echo "Mean send throughput: ", sendThroughput.mean.formatFloat(ffDecimal, 2), " ops/s"
  echo "Send throughput std dev: ", sendThroughput.standardDeviation.formatFloat(ffDecimal, 2), " ops/s"
  echo "Mean receive throughput: ", recvThroughput.mean.formatFloat(ffDecimal, 2), " ops/s"
  echo "Receive throughput std dev: ", recvThroughput.standardDeviation.formatFloat(ffDecimal, 2), " ops/s"

Results without the changes

Running MPMC throughput benchmark using 8 threads Producers: 4, Consumers: 4 Channel size: 100 Message type: array[10, int] Benchmark duration: 5.0 seconds Number of runs: 5

Run 1
Send throughput: 1086039.00 ops/s
Send success rate: 97.23%
Receive throughput: 3826210.60 ops/s
Receive success rate: 27.60%

Run 2
Send throughput: 1357983.80 ops/s
Send success rate: 99.80%
Receive throughput: 4211014.00 ops/s
Receive success rate: 32.19%

Run 3
Send throughput: 1767906.80 ops/s
Send success rate: 99.85%
Receive throughput: 5968890.00 ops/s
Receive success rate: 29.58%

Run 4
Send throughput: 1695748.80 ops/s
Send success rate: 99.92%
Receive throughput: 6218613.80 ops/s
Receive success rate: 27.25%

Run 5
Send throughput: 1676615.60 ops/s
Send success rate: 99.83%
Receive throughput: 5683107.00 ops/s
Receive success rate: 29.45%

MPMC Benchmark Results:
Mean send throughput: 1516858.80 ops/s
Send throughput std dev: 257447.44 ops/s
Mean receive throughput: 5181567.08 ops/s
Receive throughput std dev: 972198.54 ops/s

Changing acquire to tryAcquire

Running MPMC throughput benchmark using 8 threads Producers: 4, Consumers: 4 Channel size: 100 Message type: array[10, int] Benchmark duration: 5.0 seconds Number of runs: 5

Run 1
Send throughput: 7057833.80 ops/s
Send success rate: 10.71%
Receive throughput: 8034867.00 ops/s
Receive success rate: 9.41%

Run 2
Send throughput: 9773060.00 ops/s
Send success rate: 10.34%
Receive throughput: 10781693.20 ops/s
Receive success rate: 9.38%

Run 3
Send throughput: 11612252.40 ops/s
Send success rate: 10.40%
Receive throughput: 12272799.80 ops/s
Receive success rate: 9.84%

Run 4
Send throughput: 11535241.40 ops/s
Send success rate: 10.20%
Receive throughput: 12171055.40 ops/s
Receive success rate: 9.66%

Run 5
Send throughput: 11242899.00 ops/s
Send success rate: 10.53%
Receive throughput: 12424934.60 ops/s
Receive success rate: 9.53%

MPMC Benchmark Results:
Mean send throughput: 10244257.32 ops/s
Send throughput std dev: 1726884.91 ops/s
Mean receive throughput: 11137070.00 ops/s
Receive throughput std dev: 1659370.93 ops/s

Notice the 4x-6x increase in throughput.

ZoomRmc · 2024-08-15T10:23:30Z

Well, no surprise you get such an increase if you count messages that weren't sent in the throughput.

Counting only successfully sent/received messages:

Running MPMC throughput benchmark using 16 threads

Old:

MPMC Benchmark Results:
Mean send throughput: 793689.96 ops/s
Send throughput std dev: 36206.40 ops/s
Mean receive throughput: 793685.32 ops/s
Receive throughput std dev: 36207.10 ops/s

New (change below):

MPMC Benchmark Results:
Mean send throughput: 723515.12 ops/s
Send throughput std dev: 79244.62 ops/s
Mean receive throughput: 723514.32 ops/s
Receive throughput std dev: 79244.08 ops/s

9% decrease in throughput

The only change to the channels.nim:

# channelSend
  when not blocking:    if chan.isFull() or not tryAcquire(chan.lock: return false
  else:    acquire(chan.lock)
# ...  channelReceive
  when not blocking:    if chan.isEmpty() or not tryAcquire(chan.lock: return false
  else:    acquire(chan.lock)

planetis-m · 2024-08-15T10:26:40Z

After playing for a while with the benchmark's parameters I have decided to revert the tryAcquire change. I noticed that even when changing the channel size to fit all produced messages the success rates for trySend remain the same (5-9%), while the previous version gives 100% success. Seems indeed like a bad change.

EDIT: Adding exponential backoff makes the benchmark results indifferent. Both implementations reach 98% success rate.

const
  InitialBackoff = 1  # microseconds
  MaxBackoff = 16  # microseconds

proc exponentialBackoff(backoff: var int) =
  if backoff < MaxBackoff:
    backoff *= 2
  sleep(backoff)
  backoff = max(backoff, InitialBackoff)

proc producerThread(arg: ThreadArg) {.thread.} =
  var msg: Message
  var backoff = InitialBackoff
  for i in 1..arg.iterations:
    msg[0] = i  # Just to have some changing data
    if arg.chan.trySend(msg):
      discard arg.successCount[].fetchAdd(1)
      backoff = InitialBackoff  # Reset backoff on success
    else:
      discard arg.failureCount[].fetchAdd(1)
      exponentialBackoff(backoff)
    cpuRelax()

planetis-m · 2024-08-15T11:15:52Z

@Araq it's your call. This PR is complete, please review.

ZoomRmc · 2024-08-15T11:56:41Z

There's still no reasoning provided for this change.

For the basic case in your first benchmark, the throughput is getting worse.
I suspect, the CPU load is also increasing, which is not easy to measure without profiling due to how dynamic modern CPUs are.
A concrete scenario that will benefit from the change would be nice.

Another thing is that I suspect this can facilitate worse designs down the line. If your program spends any considerable time blocking on the lock, that means the provisioning (number or workers and channel size) is most probably off.

Araq · 2024-08-15T17:31:15Z

threading/channels.nim

@@ -286,39 +288,43 @@ proc `=copy`*[T](dest: var Chan[T], src: Chan[T]) =
  `=destroy`(dest)
  dest.d = src.d

-proc trySend*[T](c: Chan[T], src: sink Isolated[T]): bool {.inline.} =
+proc trySend*[T](c: Chan[T], src: var Isolated[T]): bool {.inline.} =


Why was this change necessary?

Because trySend can fail and that means whatever was sinked should be destroyed. And if trySend is used in a loop that creates copies. Maybe T cannot be copied. This PR removes all possible copies.

But then trySend(chan, isolate MyObject(...)) does not compile anymore. Maybe introduce trySendMut instead (or something with a better name).

Done, I added trySendMut (name seems fine).

No, the name sucks as it stresses what happens but not why it happens. The name should be trySendNocopy but then people (myself included) wonder why the sink does not ensure "no copy" already. What is really going on here?

@Araq I tried to explain in a comment

Or read mratim's view in rfc#347

There is no problem here with sink or copies per se. sink takes over ownership, always, and ideally trySend takes over ownership conditionally. As such var works better but one needs to watch out not to reuse the object afterwards.

Maybe it should be named tryTake.

planetis-m · 2024-08-15T18:45:37Z

For the basic case in your first benchmark, the throughput is getting worse.

In your previous comment you said that you noticed a 9% decrease in throughput. If that's the case, especially for a multithreaded benchmark that's just jitter. But thank you anyway for confirming that there's no real repercussion from this PR. This is consistent with my testing as well.

The reasoning for this change I explained earlier in the comments, if you have something to contest, let's do it in good faith. I am willing to listen if you have some actual channels usage that's impacted by this change not possible scenarios.

Here's an idea, let's take the discussion on to discord. Maybe we can come up with something.

Araq · 2024-08-16T09:48:26Z

threading/channels.nim

@@ -187,8 +187,9 @@ proc channelSend(chan: ChannelRaw, data: pointer, size: int, blocking: static bo

  when not blocking:
    if chan.isFull(): return false
-
-  acquire(chan.lock)
+    if not tryAcquire(chan.lock): return false


I'm confused. Didn't you say you removed this new logic again?

Sorry, the documentation states that this version of send doesn't block. I assume it would need to use a non-blocking version for acquiring the locks. I made two multiple producers-multiple consumers benchmarks and I see high contention. Notice that there's a single lock used by both producers and consumers and I assume that's the primary reason for the high contention. Using tryAcquire as expected increases the number of attempted calls (4x-6x), but using acquire instead improves the ratio of successful calls (from 10% to 50%). (however the total number of successful calls remains about the same). I wonder if that's just a side-effect from the fairness guarantees that are implemented for normal locks. Then I implemented exponential backoff for both benchmarks. The success rates improved to 98%, it's clearly the better mechanism to fix the issue. As such I claim that acquire is not needed here, if anything trySend/tryRecv now behaves similarly to same named functions from a lock-free channel.

ZoomRmc · 2024-08-16T09:55:39Z

But thank you anyway for confirming that there's no real repercussion from this PR.

I haven't confirmed anything like that. As I said above, to get the idea we'd need to at the very least check a couple of other scenarios (one-to-many, many-to-one).

The reasoning for this change I explained earlier in the comments

The only things I see that could pass as that is "it makes no sense to keep this limitation as the function's contract", the link to the RFC that's not accepted yet and this comment from you.

If you have something to contest, let's do it in good faith.

Absolutely, I've laid some of my concerns above openly and don't have any hidden motivation in any of the questions in previous comments.

I am willing to listen if you have some actual channels usage that's impacted by this change not possible scenarios.

It's your PR so perhaps it should be the other way round.

planetis-m · 2024-08-26T17:57:09Z

This PR contains several important bug-fixes and API additions for channels, and there's no reason not to merge it. After all the docs state:

Warning This module is experimental and its interface may change.

Araq · 2024-08-26T18:30:34Z

My bad, I missed the "revert to plain acquire" which kept me from merging it.

planetis-m · 2024-08-26T18:54:37Z

Thank you very much!

planetis-m added 6 commits August 13, 2024 13:49

better tests for edge cases

889504b

don't use ptr Chan in the tests.

ed38ad9

add a delay for non-linux

1d744b6

add a delay for tryRecv too

0d42096

mixin isolate for types that define their own isolate like cowstrings

e888fbb

change the singnature of trySend to follow the API in nim-lang/RFCs#347

cdc7ca3

This comment was marked as outdated.

Sign in to view

planetis-m added 6 commits August 14, 2024 10:56

depresation warnings--

e5fc407

revert api change

ef75005

redo differently prevnting copies

822a6ae

add warning message

8c7af51

experiment

9b426d4

remove warnings

7011ac1

planetis-m changed the title ~~better tests for edge cases~~ refactor channels.trySend/tryRecv and improve tests Aug 14, 2024

make send logic uniform with trySend

3eed847

revert tryAcquire change

3bf3bfb

re-revert and add a useful warning message

17ecb9a

Araq reviewed Aug 15, 2024

View reviewed changes

planetis-m added 2 commits August 16, 2024 00:18

review comments

a91fbfa

corrected doc comments

0cea8b4

Araq reviewed Aug 16, 2024

View reviewed changes

planetis-m added 5 commits August 16, 2024 20:51

rename trySendMut to tryTake

9c8f69c

revert to plain acquire

c18e393

clarify why tryTake may be useful

c421890

remove wrong warning

6be6d3f

fix recieving ref objects which can't be isolated

36317e5

Araq merged commit 5501a4a into nim-lang:master Aug 26, 2024
12 checks passed

refactor channels.trySend/tryRecv and improve tests #74

refactor channels.trySend/tryRecv and improve tests #74

Conversation

planetis-m commented Aug 13, 2024 • edited Loading

planetis-m commented Aug 13, 2024 • edited Loading

This comment was marked as outdated.

This comment was marked as outdated.

planetis-m commented Aug 13, 2024 • edited Loading

This comment was marked as outdated.

ZoomRmc commented Aug 14, 2024

planetis-m commented Aug 14, 2024 • edited Loading

planetis-m commented Aug 14, 2024

planetis-m commented Aug 14, 2024 • edited Loading

ZoomRmc commented Aug 14, 2024

planetis-m commented Aug 14, 2024

planetis-m commented Aug 14, 2024

ZoomRmc commented Aug 14, 2024 • edited Loading

planetis-m commented Aug 14, 2024 • edited Loading

planetis-m commented Aug 14, 2024 • edited Loading

planetis-m commented Aug 14, 2024 • edited Loading

ZoomRmc commented Aug 14, 2024

planetis-m commented Aug 15, 2024 • edited Loading

ZoomRmc commented Aug 15, 2024 • edited Loading

planetis-m commented Aug 15, 2024 • edited Loading

planetis-m commented Aug 15, 2024

ZoomRmc commented Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

planetis-m commented Aug 15, 2024

Choose a reason for hiding this comment

planetis-m Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

ZoomRmc commented Aug 16, 2024 • edited Loading

planetis-m commented Aug 26, 2024

Araq commented Aug 26, 2024

planetis-m commented Aug 26, 2024

planetis-m commented Aug 13, 2024 •

edited

Loading

planetis-m commented Aug 13, 2024 •

edited

Loading

planetis-m commented Aug 13, 2024 •

edited

Loading

planetis-m commented Aug 14, 2024 •

edited

Loading

planetis-m commented Aug 14, 2024 •

edited

Loading

ZoomRmc commented Aug 14, 2024 •

edited

Loading

planetis-m commented Aug 14, 2024 •

edited

Loading

planetis-m commented Aug 14, 2024 •

edited

Loading

planetis-m commented Aug 14, 2024 •

edited

Loading

planetis-m commented Aug 15, 2024 •

edited

Loading

ZoomRmc commented Aug 15, 2024 •

edited

Loading

planetis-m commented Aug 15, 2024 •

edited

Loading

ZoomRmc commented Aug 15, 2024 •

edited

Loading

planetis-m Aug 16, 2024 •

edited

Loading

ZoomRmc commented Aug 16, 2024 •

edited

Loading