Performance analysis and performance improvments of the networking code #1420

coot · 2020-01-08T05:54:04Z

@karknu performance analysis revealed:

COST CENTRE                           MODULE                                   SRC                                                               %time %alloc

mkCodecCborLazyBS                     Network.TypedProtocol.Codec.Cbor         src/Network/TypedProtocol/Codec/Cbor.hs:(110,1)-(126,53)            9.8   22.0

throwSocketErrorIfMinus1RetryMayBlock Network.Socket.Internal                  Network/Socket/Internal.hs:(177,1)-(178,53)                         7.4    0.3

mux                                   Network.Mux.Egress                       src/Network/Mux/Egress.hs:(112,1)-(117,75)                          7.0    3.6

driverSimple                          Network.TypedProtocol.Driver.Simple      src/Network/TypedProtocol/Driver/Simple.hs:(118,1)-(142,24)         6.0    2.6

fdPreadBuf                            System.Posix.IO.ByteString               src/System/Posix/IO/ByteString.hsc:(374,1)-(383,41)                 4.2    0.3

try                                   Ouroboros.Storage.Util.ErrorHandling     src/Ouroboros/Storage/Util/ErrorHandling.hs:73:1-72                 3.7    3.6

demux                                 Network.Mux.Ingress                      src/Network/Mux/Ingress.hs:(79,1)-(92,39)                           3.7    9.0

socketAsMuxBearer                     Network.Mux.Bearer.Socket                src/Network/Mux/Bearer/Socket.hs:(51,1)-(127,86)                    2.9    1.4

The text was updated successfully, but these errors were encountered:

karknu · 2020-01-09T13:40:27Z

        Thu Jan  9 12:19 2020 Time and Allocation Profiling Report  (Final)

           cardano-node +RTS -T -I0 -A128M -N2 -I0 -n64m -p -RTS --config configuration/mainnet-proxy-follower-node4.yaml --genesis-file configuration/mainnet-genesis.json --genesis-hash 5f20df933584822601f9e3f8c024eb5eb252fe8cefb24d1317dc3d432e940ebb --port 3000 --topology configuration/ipv6-topology.json --database-path db --socket-dir socket/

        total time  =      259.37 secs   (518740 ticks @ 1000 us, 2 processors)
        total alloc = 114,349,304,880 bytes  (excludes profiling overheads)

COST CENTRE                           MODULE                                      SRC                                                                  %time %alloc

send.\.\.\                            Network.Socket.ByteString.Lazy.Posix        Network/Socket/ByteString/Lazy/Posix.hs:32:16-75                      11.5    0.1
muxChannel.send                       Network.Mux                                 src/Network/Mux.hs:(224,5)-(239,53)                                    4.0    1.2
wrapBlockedIndefinitely               Control.Monad.Class.MonadSTM                src/Control/Monad/Class/MonadSTM.hs:199:1-74                           3.5    8.3
fdPreadBuf                            System.Posix.IO.ByteString                  src/System/Posix/IO/ByteString.hsc:(374,1)-(383,41)                    3.1    0.3
mux.go                                Network.Mux.Egress                          src/Network/Mux/Egress.hs:(114,3)-(117,75)                             2.2    0.9
buildStep.go                          Codec.CBOR.Write                            src/Codec/CBOR/Write.hs:(94,5)-(178,68)                                1.9    1.5
recvBuf.\                             Network.Socket.Buffer                       Network/Socket/Buffer.hsc:(134,9)-(135,60)                             1.8    0.2
atomically                            Control.Monad.Class.MonadSTM                src/Control/Monad/Class/MonadSTM.hs:135:3-56                           1.6    1.2
convertCborDecoderLBS.go.trailing'    Network.TypedProtocol.Codec.Cbor            src/Network/TypedProtocol/Codec/Cbor.hs:145:39-80                      1.5    6.7
putTMVar                              Control.Monad.Class.MonadSTM.Strict         src/Control/Monad/Class/MonadSTM/Strict.hs:132:1-55                    1.5    0.1
convertCborDecoderLBS.go.\.cs         Network.TypedProtocol.Codec.Cbor            src/Network/TypedProtocol/Codec/Cbor.hs:154:45-64                      1.3    8.3
processSingleWanton                   Network.Mux.Egress                          src/Network/Mux/Egress.hs:(130,1)-(148,45)                             1.2    1.3
throwSocketErrorIfMinus1RetryMayBlock Network.Socket.Internal                     Network/Socket/Internal.hs:(177,1)-(178,53)                            1.1    0.6
readTVar                              Control.Monad.Class.MonadSTM.Strict         src/Control/Monad/Class/MonadSTM/Strict.hs:85:1-49                     1.1    0.3
writeTVar                             Control.Monad.Class.MonadSTM.Strict         src/Control/Monad/Class/MonadSTM/Strict.hs:(88,1)-(90,25)              1.1    0.6
throwSocketErrorIfMinus1Retry         Network.Socket.Internal                     Network/Socket/Internal.hs:180:1-55                                    1.1    0.2
bytesMP                               Codec.CBOR.Write                            src/Codec/CBOR/Write.hs:(371,1)-(372,76)                               1.1    1.0
<>.\                                  Codec.CBOR.Encoding                         src/Codec/CBOR/Encoding.hs:139:49-58                                   1.0    0.7
withOpenState                         Ouroboros.Storage.ImmutableDB.Impl.State    src/Ouroboros/Storage/ImmutableDB/Impl/State.hs:(246,1)-(287,42)       1.0    1.2
toLazyByteString                      Network.TypedProtocol.Codec.Cbor            src/Network/TypedProtocol/Codec/Cbor.hs:(158,1)-(165,58)               0.9    3.8
instructionHelper                     Ouroboros.Storage.ChainDB.Impl.Reader       src/Ouroboros/Storage/ChainDB/Impl/Reader.hs:(231,1)-(363,38)          0.8    1.4
encodeMuxSDU.hdr                      Network.Mux.Codec                           src/Network/Mux/Codec.hs:28:7-26                                       0.8    8.6
iteratorNextImpl.\                    Ouroboros.Storage.ImmutableDB.Impl.Iterator src/Ouroboros/Storage/ImmutableDB/Impl/Iterator.hs:(374,9)-(386,57)    0.6    1.1
pread.\                               Ouroboros.Storage.IO                        src-unix/Ouroboros/Storage/IO.hs:(103,5)-(104,83)                      0.2    1.7
p_string.p_char                       Text.JSON.Canonical.Parse                   Text/JSON/Canonical/Parse.hs:(181,9)-(182,60)                          0.1    1.2
fromCompactRedeemVerificationKey.bs   Cardano.Crypto.Signing.Redeem.Compact       src/Cardano/Crypto/Signing/Redeem/Compact.hs:(81,3)-(82,46)            0.1    2.0

Previous profile was from an Arm system, this is from an x86 vm.

karknu · 2020-01-09T14:49:25Z

From the profiling one can see that it is the actual sending of the data that is consuming CPU. send.\.\.\ is the mux thread doing to actual socket call and muxChannel.send is either chainsync or blockfetch responder waiting the mux thread to finish so that they can send more data.

We have an excessive amount of syscalls for every SDU we write, getSocketName and getSocketOption and the actual send (which is really writev).

getSocketName and getSocketOption is used to dynamically decide a maximum mux segment size and switching to static one (I tried with 13000) yields a 10% speed improvement.

However even with a segment size limit of 13000 bytes 99% of all messages will end up being far smaller, say around 700 bytes for a single empty block. By coalescing messages from the same miniprotocol while they are waiting for service we can reduce the number of write calls even further. Doing this improves the speed by 20%. At this point the client appears to be the bottle neck.

karknu · 2020-01-09T14:51:09Z

        Thu Jan  9 15:45 2020 Time and Allocation Profiling Report  (Final)

           cardano-node +RTS -T -I0 -A128M -N2 -I0 -n64m -pa -RTS --config configuration/mainnet-proxy-follower-node4.yaml --genesis-file configuration/mainnet-genesis.json --genesis-hash 5f20df933584822601f9e3f8c024eb5eb252fe8cefb24d1317dc3d432e940ebb --port 3000 --topology configuration/ipv6-topology.json --database-path db --socket-dir socket/

        total time  =      637.92 secs   (1275850 ticks @ 1000 us, 2 processors)
        total alloc = 108,139,507,952 bytes  (excludes profiling overheads)

COST CENTRE                                                                    MODULE                                                        SRC                                                                                                                                                     %time %alloc  ticks     bytes

IDLE                                                                           IDLE                                                          <built-in>                                                                                                                                               67.4    0.0  860414         0
OVERHEAD_of                                                                    PROFILING                                                     <built-in>                                                                                                                                                5.9    6.5  75048 7050400296
send.\.\.\                                                                     Network.Socket.ByteString.Lazy.Posix                          Network/Socket/ByteString/Lazy/Posix.hs:32:16-75                                                                                                          2.0    0.0  24906  51517648
muxChannel.send                                                                Network.Mux                                                   src/Network/Mux.hs:(224,5)-(249,53)                                                                                                                       1.7    1.8  21279 1969767424
fdPreadBuf                                                                     System.Posix.IO.ByteString                                    src/System/Posix/IO/ByteString.hsc:(374,1)-(383,41)                                                                                                       0.9    0.3  11226 295045472
mux.go                                                                         Network.Mux.Egress                                            src/Network/Mux/Egress.hs:(114,3)-(117,75)                                                                                                                0.7    0.5   8945 553005008
wrapBlockedIndefinitely                                                        Control.Monad.Class.MonadSTM                                  src/Control/Monad/Class/MonadSTM.hs:199:1-74                                                                                                              0.6    6.6   7886 7181204976
GC                                                                             GC                                                            <built-in>                                                                                                                                                0.6    0.0   7632    113216
buildStep.go                                                                   Codec.CBOR.Write                                              src/Codec/CBOR/Write.hs:(94,5)-(178,68)                                                                                                                   0.5    1.7   6543 1790538672
atomically                                                                     Control.Monad.Class.MonadSTM                                  src/Control/Monad/Class/MonadSTM.hs:135:3-56                                                                                                              0.5    1.2   5755 1327553992
writeTVar                                                                      Control.Monad.Class.MonadSTM.Strict                           src/Control/Monad/Class/MonadSTM/Strict.hs:(88,1)-(90,25)                                                                                                 0.4    0.6   5515 679959160
convertCborDecoderLBS.go.\.cs                                                  Network.TypedProtocol.Codec.Cbor                              src/Network/TypedProtocol/Codec/Cbor.hs:154:45-64                                                                                                         0.4    6.7   5173 7193101912
SYSTEM                                                                         SYSTEM                                                        <built-in>                                                                                                                                                0.4    0.1   5037  59155328
readTVar                                                                       Control.Monad.Class.MonadSTM.Strict                           src/Control/Monad/Class/MonadSTM/Strict.hs:85:1-49                                                                                                        0.4    0.3   4938 330777008
bytesMP                                                                        Codec.CBOR.Write                                              src/Codec/CBOR/Write.hs:(371,1)-(372,76)                                                                                                                  0.4    1.0   4800 1115143232
convertCborDecoderLBS.go.trailing'                                             Network.TypedProtocol.Codec.Cbor                              src/Network/TypedProtocol/Codec/Cbor.hs:145:39-80                                                                                                         0.4    5.4   4543 5831300976
toLazyByteString                                                               Network.TypedProtocol.Codec.Cbor                              src/Network/TypedProtocol/Codec/Cbor.hs:(158,1)-(165,58)                                                                                                  0.3    4.1   4039 4416897992
throwSocketErrorIfMinus1RetryMayBlock                                          Network.Socket.Internal                                       Network/Socket/Internal.hs:(177,1)-(178,53)                                                                                                               0.3    0.5   3951 547585952
<>.\                                                                           Codec.CBOR.Encoding                                           src/Codec/CBOR/Encoding.hs:139:49-58                                                                                                                      0.3    0.7   3789 771407888
recvBuf.\                                                                      Network.Socket.Buffer                                         Network/Socket/Buffer.hsc:(134,9)-(135,60)                                                                                                                0.3    0.2   3478 195192776
iteratorNextImpl.\.\                                                           Ouroboros.Storage.ImmutableDB.Impl.Iterator                   src/Ouroboros/Storage/ImmutableDB/Impl/Iterator.hs:(374,40)-(386,57)

Profile with static max segment size and coalescing messages as described above

1434: Perfermance improvements for sending. r=karknu a=karknu Implements performance improvements discussed in #1420 . Co-authored-by: Karl Knutsson <[email protected]>

coot added optimisation Performance optimisation priority high byron Required for a Byron mainnet: replace the old core nodes with cardano-node labels Jan 8, 2020

coot added this to the S4 2020-01-16 milestone Jan 8, 2020

edsko mentioned this issue Jan 8, 2020

Audit memory allocation #1392

Closed

coot added the networking label Jan 9, 2020

karknu mentioned this issue Jan 9, 2020

Perfermance improvements for sending. #1434

Merged

iohk-bors bot added a commit that referenced this issue Jan 10, 2020

Merge #1434

64577b5

1434: Perfermance improvements for sending. r=karknu a=karknu Implements performance improvements discussed in #1420 . Co-authored-by: Karl Knutsson <[email protected]>

karknu closed this as completed Jan 10, 2020

coot pushed a commit that referenced this issue May 16, 2022

Merge #1434

7ceac57

1434: Perfermance improvements for sending. r=karknu a=karknu Implements performance improvements discussed in #1420 . Co-authored-by: Karl Knutsson <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance analysis and performance improvments of the networking code #1420

Performance analysis and performance improvments of the networking code #1420

coot commented Jan 8, 2020

karknu commented Jan 9, 2020

karknu commented Jan 9, 2020

karknu commented Jan 9, 2020

Performance analysis and performance improvments of the networking code #1420

Performance analysis and performance improvments of the networking code #1420

Comments

coot commented Jan 8, 2020

karknu commented Jan 9, 2020

karknu commented Jan 9, 2020

karknu commented Jan 9, 2020