Queue #100924

miretskiy · 2023-04-07T16:27:13Z

pkg/util: Add generic queue.Queue[T] container

Add generic queue.Queue[T] contaner.
Supports Push() operation. Supports FIFO, and
LIFO consumption (PopFront(), Pop()).

This implementation allocates chunks of []T to
amortize memory allocations.

pkg/util: Add concurrent package

concurrent package implements various primitives
around asynchrnonous execution.

concurrent.Executor defines an executor that can
execute functions.

Of course, Go has a perfectly good go func() mechanism
to execute concurrent code. However, sometimes the caller
wants to avoid spinning up many Go routines in short burst.
Doing so tents to negatively impact Go runtime, and cause
spikes in latency. concurrent.NewWorkQueue implements
a mechanism whereby the caller may create a work queue --
a queue of closures -- that will run on a bounded
number of worker goroutines.

Epic: None

Release note: None

cockroach-teamcity · 2023-04-07T16:27:25Z

This change is

Add generic `queue.Queue[T]` contaner. Supports `Push()` operation. Suports FIFO, and LIFO consumption (`PopFront()`, `Pop()`). This implementation allocates chunks of []T to amortize memory allocations. Relase note: None

`concurrent` package implements various primitives around asynchrnonous execution. `concurrent.Executor` defines an executor that can execute functions. Of course, Go has a perfectly good `go func()` mechanism to execute concurrent code. However, sometimes the caller wants to avoid spinning up many Go routines in short burst. Doing so tents to negatively impact Go runtime, and cause spikes in latency. `concurrent.NewWorkQueue` implements a mechanism whereby the caller may create a work queue -- a queue of closures -- that will run on a bounded number of worker goroutines. Release note: None

erikgrinaker · 2023-04-11T15:59:55Z

A generic high-performance FIFO/LIFO queue with amortized allocation makes sense. We have another such implementation in the Raft scheduler that it might make sense to combine with this (we'll likely remove the prioritization there shortly in #101023):

cockroach/pkg/kv/kvserver/scheduler.go

Lines 62 to 72 in 3478140

    
           // rangeIDQueue is a chunked queue of range IDs. Instead of a separate list 
        
           // element for every range ID, it uses a rangeIDChunk to hold many range IDs, 
        
           // amortizing the allocation/GC cost. Using a chunk queue avoids any copying 
        
           // that would occur if a slice were used (the copying would occur on slice 
        
           // reallocation). 
        
           // 
        
           // The queue has a naive understanding of priority and fairness. For the most 
        
           // part, it implements a FIFO queueing policy with no prioritization of some 
        
           // ranges over others. However, the queue can be configured with up to one 
        
           // high-priority range, which will always be placed at the front when added. 
        
           type rangeIDQueue struct {

I'm not so sure about the executor. I feel like there's always enough variation and subtlety in how we want to do concurrency in various components, and it's trivial to spin up a few goroutines as needed. I also feel like we already have too many different utility functions/frameworks for doing stuff like this. If we have more than three places where it makes sense to reuse exactly this, and we can consolidate some other utility functions, then sure, let's do it, but otherwise I'd be inclined to implement custom workers as needed and instead make them easy to construct.

In any case, this seems a bit overkill for the mux rangefeed shutdown case, since there's only a single worker and performance/allocations doesn't really matter afaict.

miretskiy · 2023-04-11T17:10:01Z

@erikgrinaker thanks for the review; do you want me to split up this draft, and just send queue portion for "official" review?

erikgrinaker · 2023-04-11T18:43:19Z

Sure.

pav-kv · 2023-04-11T19:23:38Z

@miretskiy Do you have a specific use-case in mind for this queue? Let's compare with the standard Go's slice which can also be used like a queue: enqueue is a simple slice append, dequeue/pop is item, slice = slice[0], slice[1:].

Specifically, if this implementation is better in some sense, let's have benchmarks?

The standard slice is also allocation-efficient: there are O(log N) allocations for N appends. It's also space-efficient: max 2x overhead. Copy-efficient: amortised O(1) copies per element. But it's not thread-safe.

So there should be selling points. Are we aiming to implement a thread-safe/lock-free queue? Which also does not copy entries around, so pointers to its elements stay valid across appends?

Are we basically implementing a var-size channel?

miretskiy · 2023-04-11T19:28:30Z

@miretskiy Do you have a specific use-case in mind for this queue? Let's compare with the standard Go's slice which can also be used like a queue: enqueue is a simple slice append, dequeue/pop is item, slice = slice[0], slice[1:].

Sure; and in the above case, the underlying slice array might stick around for a while;
Many pops like this, and you will wind up with mysterious memory that can't be GCed.
Of course, there are tricks you can use; copy the tail of the queue eventually, but, then, I guess
slice is not that easy to use. Anyways, if this is not used in the muxrf PR, this will probably remain
as a draft PR until more use cases arrise.

Specifically, if this implementation is better in some sense, let's have benchmarks?

The standard slice is also allocation-efficient: there are O(log N) allocations for N appends. It's also space-efficient: max 2x overhead. Copy-efficient: amortised O(1) copies per element. But it's not thread-safe.

So there should be selling points. Are we aiming to implement a thread-safe queue? Which also does not copy entries around, so pointers to its elements stay valid across appends?

Well, that's at least what the other PR tried to do -- built on top of that queue to give more/less efficient
and thread safe producer/consumer queue.

pav-kv · 2023-04-11T19:30:17Z

Also, if we're supporting both LIFO/FIFO, should we name it Deque? https://en.wikipedia.org/wiki/Double-ended_queue

pav-kv · 2023-04-11T19:46:16Z

Sure; and in the above case, the underlying slice array might stick around for a while;

Yep. It will be released next time an append reallocates. But I agree the lifetime of an element would be slightly less controlled. This could be worked around as you say: we could sometimes force the reallocation.

With the chunked queue this is true too. An element stays in the chunk until the chunk is released, which generally can happen after a while too (when dequeues pass the chunk border).

erikgrinaker · 2023-04-11T20:10:30Z

The reason for the rangeIDQueue in the Raft scheduler is precisely to avoid this copying on reallocations.

cockroach/pkg/kv/kvserver/scheduler.go

Lines 62 to 66 in 3478140

    
           // rangeIDQueue is a chunked queue of range IDs. Instead of a separate list 
        
           // element for every range ID, it uses a rangeIDChunk to hold many range IDs, 
        
           // amortizing the allocation/GC cost. Using a chunk queue avoids any copying 
        
           // that would occur if a slice were used (the copying would occur on slice 
        
           // reallocation).

Those and the chunked reallocations (as opposed to ever-growing ones) are the main advantages over a simple slice. I'd only really consider a chunk queue in performance-critical high-throughput hot paths like the Raft scheduler.

miretskiy · 2023-04-11T20:12:02Z

Those and the chunked reallocations (as opposed to ever-growing ones) are the main advantages over a simple slice. I'd only really consider a chunk queue in performance-critical high-throughput hot paths like the Raft scheduler.

Can't really put cdc in the same realm as raft, but we had to switch to the chunked allocation because of the
same reason: many events arriving, causing many small allocations -- as opposed to chunk allocations.

pav-kv · 2023-04-11T20:18:17Z

Note that the raft queue has a fixed chunk size. In this PR the chunk size increases depending on the queue len (with min and max limit on the chunk size). This policy could be made a parameter, or we could use the same for simplicity.

@erikgrinaker would there be a benefit in making this queue [semi-]lock-free? Currently the raft one is used under a lock, right? But the lock is really needed for allocating / linking in a new chunk. Within a chunk, appends/pops could be lock-free using some compare-and-swaps.

pav-kv · 2023-04-11T20:22:08Z

Also, the raft queue seems to be multi-producer/single-consumer? This could also be optimized for.

The complicating bit is that the raft queue is integrated with the scheduler. The lock is used both for the queue, and for the scheduler bits. So, to take advantage of a lock-free queue, we would need to untangle the scheduler a bit.

Maybe there is no large benefit in deduping the enqueues (the invariant in raft scheduler is that the range ID is present in the queue only once). We could make it a bit more best-effort/at-least-once, if the "less locking" benefit outweighs the memory savings. We could probably still retain exactly-once though, with some workarounds.

miretskiy · 2023-04-11T20:22:42Z

Note that the raft queue has a fixed chunk size. In this PR the chunk size increases depending on the queue len (with min and max limit on the chunk size). This policy could be made a parameter, or we could use the same for simplicity.

@erikgrinaker would there be a benefit in making this queue [semi-]lock-free? Currently the raft one is used under a lock, right? But the lock is really needed for allocating / linking in a new chunk. Within a chunk, appends/pops could be lock-free using some compare-and-swaps.

I can certainly make it lock free. And variable chunk size; well, that's just trying to be a bit fancy.
Smaller chunks for smaller queue

pav-kv · 2023-04-11T21:30:47Z

Also, the raft scheduler has 2 kinds of queues:

raftReceiveQueue, which is per-range. It is multi-producer/single-consumer (the consumer is the range under raftMu?).
The scheduling queue that keeps the "dirty" ranges. Also multi-producer/single-consumer (the consumer is the scheduler shard).

Seemingly, we could use a generic chunked lock-free queue for both cases.

erikgrinaker · 2023-04-11T22:40:26Z

The complicating bit is that the raft queue is integrated with the scheduler. The lock is used both for the queue, and for the scheduler bits. So, to take advantage of a lock-free queue, we would need to untangle the scheduler a bit.

Yeah, this is going to require a larger restructuring of the scheduler. Could try some quick experiments just to see how large the gains would be (if any). Want to write up an issue?

pav-kv · 2023-04-11T22:46:01Z

@erikgrinaker Yeah, I can write up. Also would be up for experimenting, to get a feel for this approach as it might end up useful in rangefeeds work.

miretskiy · 2023-04-12T11:51:36Z

If you don't mind, I'll spend some time to try to make this a lock free deque. to see how it performs

cockroach-teamcity · 2024-03-16T18:01:45Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Yevgeniy Miretskiy seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

miretskiy requested review from erikgrinaker, pav-kv and aliher1911 April 7, 2023 16:27

miretskiy force-pushed the queue branch from 3e144d7 to 3bd649a Compare April 7, 2023 16:29

miretskiy mentioned this pull request Apr 7, 2023

kvcoord: Rework error propagation in mux rangefeed #100649

Merged

miretskiy force-pushed the queue branch 3 times, most recently from 30e3c86 to 0b17ddf Compare April 7, 2023 18:40

Yevgeniy Miretskiy added 2 commits April 7, 2023 17:14

pkg/util: Add generic queue.Queue[T] container

5e0fc13

Add generic `queue.Queue[T]` contaner. Supports `Push()` operation. Suports FIFO, and LIFO consumption (`PopFront()`, `Pop()`). This implementation allocates chunks of []T to amortize memory allocations. Relase note: None

miretskiy force-pushed the queue branch from 0b17ddf to ff2eeae Compare April 7, 2023 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue #100924

Queue #100924

miretskiy commented Apr 7, 2023 •

edited

Loading

cockroach-teamcity commented Apr 7, 2023

erikgrinaker commented Apr 11, 2023

miretskiy commented Apr 11, 2023

erikgrinaker commented Apr 11, 2023

pav-kv commented Apr 11, 2023 •

edited

Loading

miretskiy commented Apr 11, 2023

pav-kv commented Apr 11, 2023

pav-kv commented Apr 11, 2023 •

edited

Loading

erikgrinaker commented Apr 11, 2023

miretskiy commented Apr 11, 2023

pav-kv commented Apr 11, 2023 •

edited

Loading

pav-kv commented Apr 11, 2023 •

edited

Loading

miretskiy commented Apr 11, 2023

pav-kv commented Apr 11, 2023 •

edited

Loading

erikgrinaker commented Apr 11, 2023

pav-kv commented Apr 11, 2023

miretskiy commented Apr 12, 2023

cockroach-teamcity commented Mar 16, 2024

Queue #100924

Are you sure you want to change the base?

Queue #100924

Conversation

miretskiy commented Apr 7, 2023 • edited Loading

cockroach-teamcity commented Apr 7, 2023

erikgrinaker commented Apr 11, 2023

miretskiy commented Apr 11, 2023

erikgrinaker commented Apr 11, 2023

pav-kv commented Apr 11, 2023 • edited Loading

miretskiy commented Apr 11, 2023

pav-kv commented Apr 11, 2023

pav-kv commented Apr 11, 2023 • edited Loading

erikgrinaker commented Apr 11, 2023

miretskiy commented Apr 11, 2023

pav-kv commented Apr 11, 2023 • edited Loading

pav-kv commented Apr 11, 2023 • edited Loading

miretskiy commented Apr 11, 2023

pav-kv commented Apr 11, 2023 • edited Loading

erikgrinaker commented Apr 11, 2023

pav-kv commented Apr 11, 2023

miretskiy commented Apr 12, 2023

cockroach-teamcity commented Mar 16, 2024

miretskiy commented Apr 7, 2023 •

edited

Loading

pav-kv commented Apr 11, 2023 •

edited

Loading

pav-kv commented Apr 11, 2023 •

edited

Loading

pav-kv commented Apr 11, 2023 •

edited

Loading

pav-kv commented Apr 11, 2023 •

edited

Loading

pav-kv commented Apr 11, 2023 •

edited

Loading