Alloc count regression in #8 #14

tbg · 2022-12-22T09:25:59Z

Looks like #8 gave us a regression in alloc count.

$ git checkout 65a0bf3
$ benchdiff --old HEAD~1 -r BenchmarkRawNode -c 5 .
checking out '91eb751'
building benchmark binaries for '91eb751' 1/1 |
test binaries already exist for '65a0bf3'; skipping build

  pkg=1/1 iter=5/5 go.etcd.io/raft/v3 -

name                     old time/op        new time/op        delta
RawNode/two-voters-10          1.14µs ± 1%        1.39µs ± 3%  +22.47%  (p=0.008 n=5+5)
RawNode/single-voter-10         685ns ± 2%         918ns ± 2%  +33.99%  (p=0.008 n=5+5)

name                     old firstIndex/op  new firstIndex/op  delta
RawNode/single-voter-10          2.00 ± 0%          1.00 ± 0%  -50.00%  (p=0.008 n=5+5)
RawNode/two-voters-10            6.00 ± 0%          3.00 ± 0%  -50.00%  (p=0.008 n=5+5)

name                     old lastIndex/op   new lastIndex/op   delta
RawNode/single-voter-10          2.00 ± 0%          2.00 ± 0%     ~     (all equal)
RawNode/two-voters-10            2.00 ± 0%          2.00 ± 0%     ~     (all equal)

name                     old ready/op       new ready/op       delta
RawNode/single-voter-10          2.00 ± 0%          2.00 ± 0%     ~     (all equal)
RawNode/two-voters-10            2.00 ± 0%          2.00 ± 0%     ~     (all equal)

name                     old term/op        new term/op        delta
RawNode/two-voters-10            1.00 ± 0%          1.00 ± 0%     ~     (all equal)
RawNode/single-voter-10          0.00 ± 0%          0.00 ± 0%  +33.33%  (p=0.008 n=5+5)

name                     old alloc/op       new alloc/op       delta
RawNode/single-voter-10          415B ± 1%          534B ± 0%  +28.62%  (p=0.016 n=5+4)
RawNode/two-voters-10            637B ± 0%          888B ± 1%  +39.43%  (p=0.008 n=5+5)

name                     old allocs/op      new allocs/op      delta
RawNode/two-voters-10            7.00 ± 0%          8.00 ± 0%  +14.29%  (p=0.008 n=5+5)
RawNode/single-voter-10          5.00 ± 0%          6.00 ± 0%  +20.00%  (p=0.008 n=5+5)

The text was updated successfully, but these errors were encountered:

tbg · 2022-12-22T09:26:22Z

cc @nvanbenschoten

nvanbenschoten · 2022-12-22T15:28:29Z

Thanks @tbg for filing this.

The alloc/op (allocation size) metric regressed, but that was more than compensated by the improvement from 55e6a83. So we did not see a regression in alloc/op across this work, which was part of the reason why we landed the two together. To some extent, the same applies for the time/op metric.

➜ benchdiff --new 55e6a83 -r BenchmarkRawNode -c 10 .
test binaries already exist for '07da8bd'; skipping build
test binaries already exist for '55e6a83'; skipping build

  pkg=1/1 iter=10/10 go.etcd.io/etcd/raft/v3 |

name                     old time/op        new time/op        delta
RawNode/two-voters-10          1.30µs ± 1%        1.13µs ± 2%  -12.97%  (p=0.000 n=10+10)
RawNode/single-voter-10         732ns ± 2%         716ns ± 1%   -2.17%  (p=0.000 n=10+10)

name                     old firstIndex/op  new firstIndex/op  delta
RawNode/single-voter-10          5.00 ± 0%          5.00 ± 0%     ~     (all equal)
RawNode/two-voters-10            9.00 ± 0%          9.00 ± 0%     ~     (all equal)

name                     old lastIndex/op   new lastIndex/op   delta
RawNode/single-voter-10          2.00 ± 0%          2.00 ± 0%     ~     (all equal)
RawNode/two-voters-10            2.00 ± 0%          2.00 ± 0%     ~     (all equal)

name                     old ready/op       new ready/op       delta
RawNode/single-voter-10          2.00 ± 0%          2.00 ± 0%     ~     (all equal)
RawNode/two-voters-10            2.00 ± 0%          2.00 ± 0%     ~     (all equal)

name                     old term/op        new term/op        delta
RawNode/single-voter-10          0.00 ± 0%          0.00 ± 0%     ~     (all equal)
RawNode/two-voters-10            1.00 ± 0%          1.00 ± 0%     ~     (all equal)

name                     old alloc/op       new alloc/op       delta
RawNode/two-voters-10            965B ± 0%          637B ± 0%  -33.96%  (p=0.000 n=10+10)
RawNode/single-voter-10          428B ± 0%          420B ± 1%   -1.94%  (p=0.000 n=8+10)

name                     old allocs/op      new allocs/op      delta
RawNode/single-voter-10          5.00 ± 0%          5.00 ± 0%     ~     (all equal)
RawNode/two-voters-10            7.00 ± 0%          7.00 ± 0%     ~     (all equal)

When I looked into the allocs/op regression earlier (with 2c2960c), I found that it was due to the introduction of the msgsAfterAppend field and the split of messages across two slices. Now that we're tracking messages across two slices, we have two allocs for the bookkeeping.

raft/raft.go

Lines 333 to 348 in 65a0bf3

    
           // msgs contains the list of messages that should be sent out immediately to 
        
           // other nodes. 
        
           // 
        
           // Messages in this list must target other nodes. 
        
           msgs []pb.Message 
        
           // msgsAfterAppend contains the list of messages that should be sent after 
        
           // the accumulated unstable state (e.g. term, vote, []entry, and snapshot) 
        
           // has been persisted to durable storage. This includes waiting for any 
        
           // unstable state that is already in the process of being persisted (i.e. 
        
           // has already been handed out in a prior Ready struct) to complete. 
        
           // 
        
           // Messages in this list may target other nodes or may target this node. 
        
           // 
        
           // Messages in this list have the type MsgAppResp, MsgVoteResp, or 
        
           // MsgPreVoteResp. See the comment in raft.send for details. 
        
           msgsAfterAppend []pb.Message

I think at some point we'll want to find a way to avoid reallocating either of these slices per Ready iteration in the common case, but that runs into questions of ownership because the slices can be handed back to the app. Do we want to expand the library's API for this? For the non-async writes logic, we can probably do something simpler to recycle the msgsAfterAppend slice across Ready iterations within RawNode, because the slice is never handed to the user of the library.

Do you think it's worth it to go part way and shave a heap allocation off the AsyncStorageWrites = false case (at the expense of a bit of slice re-use logic)? If so, I can look into that.

tbg · 2022-12-22T15:37:54Z

Do you think it's worth it to go part way and shave a heap allocation off the AsyncStorageWrites = false case (at the expense of a bit of slice re-use logic)? If so, I can look into that.

No, I don't think so. Thanks for the explanation, I had forgotten about 55e6a83.

tbg · 2022-12-22T15:51:31Z

I think at some point we'll want to find a way to avoid reallocating either of these slices per Ready iteration in the common case, but that runs into questions of ownership because the slices can be handed back to the app. Do we want to expand the library's API for this? For the non-async writes logic, we can probably do something simpler to recycle the msgsAfterAppend slice across Ready iterations within RawNode, because the slice is never handed to the user of the library.

Yeah, (as you know) these kind of ownership questions keep coming up on our side of the fence (CRDB) as well, most recently here.

Maybe a model that can work is that we have a static method taking a *[]raftpb.Entry in raft that zeroes and puts slices into sync.Pools that RawNode will pull from. We can then say that raft does not claim ownership over any data it has handed out, but that the application is free to optimize allocations by returning slices to the pool once it no longer references them. This lets those who want to optimize optimize while sparing pool-reuse bugs from those who don't need that level of performance.

It would also be nice to pool the entry.Data slice, which - I believe - raft never "owns" in the first place (i.e. never allocates it, except ConfChanges): entries come in either through Propose or from raft.Storage. So memory reuse is completely up to the app anyway. In CRDB, it's currently hard to track the lifecycle of these because these slices end up both on the raft transport and the raft entry cache.

tbg · 2022-12-22T16:01:08Z

Filed #15 for discussing the memory reuse angle.

tbg closed this as completed Dec 22, 2022

tbg mentioned this issue Dec 22, 2022

Allow reuse of slices exposed through Ready #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alloc count regression in #8 #14

Alloc count regression in #8 #14

tbg commented Dec 22, 2022 •

edited

Loading

tbg commented Dec 22, 2022

nvanbenschoten commented Dec 22, 2022

tbg commented Dec 22, 2022

tbg commented Dec 22, 2022

tbg commented Dec 22, 2022

Alloc count regression in #8 #14

Alloc count regression in #8 #14

Comments

tbg commented Dec 22, 2022 • edited Loading

tbg commented Dec 22, 2022

nvanbenschoten commented Dec 22, 2022

tbg commented Dec 22, 2022

tbg commented Dec 22, 2022

tbg commented Dec 22, 2022

tbg commented Dec 22, 2022 •

edited

Loading