-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue #100924
base: master
Are you sure you want to change the base?
Queue #100924
Conversation
30e3c86
to
0b17ddf
Compare
Add generic `queue.Queue[T]` contaner. Supports `Push()` operation. Suports FIFO, and LIFO consumption (`PopFront()`, `Pop()`). This implementation allocates chunks of []T to amortize memory allocations. Relase note: None
`concurrent` package implements various primitives around asynchrnonous execution. `concurrent.Executor` defines an executor that can execute functions. Of course, Go has a perfectly good `go func()` mechanism to execute concurrent code. However, sometimes the caller wants to avoid spinning up many Go routines in short burst. Doing so tents to negatively impact Go runtime, and cause spikes in latency. `concurrent.NewWorkQueue` implements a mechanism whereby the caller may create a work queue -- a queue of closures -- that will run on a bounded number of worker goroutines. Release note: None
A generic high-performance FIFO/LIFO queue with amortized allocation makes sense. We have another such implementation in the Raft scheduler that it might make sense to combine with this (we'll likely remove the prioritization there shortly in #101023): cockroach/pkg/kv/kvserver/scheduler.go Lines 62 to 72 in 3478140
I'm not so sure about the executor. I feel like there's always enough variation and subtlety in how we want to do concurrency in various components, and it's trivial to spin up a few goroutines as needed. I also feel like we already have too many different utility functions/frameworks for doing stuff like this. If we have more than three places where it makes sense to reuse exactly this, and we can consolidate some other utility functions, then sure, let's do it, but otherwise I'd be inclined to implement custom workers as needed and instead make them easy to construct. In any case, this seems a bit overkill for the mux rangefeed shutdown case, since there's only a single worker and performance/allocations doesn't really matter afaict. |
@erikgrinaker thanks for the review; do you want me to split up this draft, and just send queue portion for "official" review? |
Sure. |
@miretskiy Do you have a specific use-case in mind for this queue? Let's compare with the standard Go's slice which can also be used like a queue: enqueue is a simple slice Specifically, if this implementation is better in some sense, let's have benchmarks? The standard slice is also allocation-efficient: there are O(log N) allocations for N appends. It's also space-efficient: max 2x overhead. Copy-efficient: amortised O(1) copies per element. But it's not thread-safe. So there should be selling points. Are we aiming to implement a thread-safe/lock-free queue? Which also does not copy entries around, so pointers to its elements stay valid across appends? Are we basically implementing a var-size channel? |
Sure; and in the above case, the underlying slice array might stick around for a while;
Well, that's at least what the other PR tried to do -- built on top of that queue to give more/less efficient |
Also, if we're supporting both LIFO/FIFO, should we name it |
Yep. It will be released next time an append reallocates. But I agree the lifetime of an element would be slightly less controlled. This could be worked around as you say: we could sometimes force the reallocation. With the chunked queue this is true too. An element stays in the chunk until the chunk is released, which generally can happen after a while too (when dequeues pass the chunk border). |
The reason for the cockroach/pkg/kv/kvserver/scheduler.go Lines 62 to 66 in 3478140
Those and the chunked reallocations (as opposed to ever-growing ones) are the main advantages over a simple slice. I'd only really consider a chunk queue in performance-critical high-throughput hot paths like the Raft scheduler. |
Can't really put cdc in the same realm as raft, but we had to switch to the chunked allocation because of the |
Note that the raft queue has a fixed chunk size. In this PR the chunk size increases depending on the queue len (with min and max limit on the chunk size). This policy could be made a parameter, or we could use the same for simplicity. @erikgrinaker would there be a benefit in making this queue [semi-]lock-free? Currently the raft one is used under a lock, right? But the lock is really needed for allocating / linking in a new chunk. Within a chunk, appends/pops could be lock-free using some compare-and-swaps. |
Also, the raft queue seems to be multi-producer/single-consumer? This could also be optimized for. The complicating bit is that the raft queue is integrated with the scheduler. The lock is used both for the queue, and for the scheduler bits. So, to take advantage of a lock-free queue, we would need to untangle the scheduler a bit. Maybe there is no large benefit in deduping the enqueues (the invariant in raft scheduler is that the range ID is present in the queue only once). We could make it a bit more best-effort/at-least-once, if the "less locking" benefit outweighs the memory savings. We could probably still retain exactly-once though, with some workarounds. |
I can certainly make it lock free. And variable chunk size; well, that's just trying to be a bit fancy. |
Also, the raft scheduler has 2 kinds of queues:
Seemingly, we could use a generic chunked lock-free queue for both cases. |
Yeah, this is going to require a larger restructuring of the scheduler. Could try some quick experiments just to see how large the gains would be (if any). Want to write up an issue? |
@erikgrinaker Yeah, I can write up. Also would be up for experimenting, to get a feel for this approach as it might end up useful in rangefeeds work. |
If you don't mind, I'll spend some time to try to make this a lock free deque. to see how it performs |
Yevgeniy Miretskiy seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
pkg/util: Add generic queue.Queue[T] container
Add generic
queue.Queue[T]
contaner.Supports
Push()
operation. Supports FIFO, andLIFO consumption (
PopFront()
,Pop()
).This implementation allocates chunks of []T to
amortize memory allocations.
pkg/util: Add concurrent package
concurrent
package implements various primitivesaround asynchrnonous execution.
concurrent.Executor
defines an executor that canexecute functions.
Of course, Go has a perfectly good
go func()
mechanismto execute concurrent code. However, sometimes the caller
wants to avoid spinning up many Go routines in short burst.
Doing so tents to negatively impact Go runtime, and cause
spikes in latency.
concurrent.NewWorkQueue
implementsa mechanism whereby the caller may create a work queue --
a queue of closures -- that will run on a bounded
number of worker goroutines.
Epic: None
Release note: None