Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance analysis and performance improvments of the networking code #1420

Closed
coot opened this issue Jan 8, 2020 · 3 comments
Closed
Labels
byron Required for a Byron mainnet: replace the old core nodes with cardano-node optimisation Performance optimisation
Milestone

Comments

@coot
Copy link
Contributor

coot commented Jan 8, 2020

@karknu performance analysis revealed:

COST CENTRE                           MODULE                                   SRC                                                               %time %alloc

mkCodecCborLazyBS                     Network.TypedProtocol.Codec.Cbor         src/Network/TypedProtocol/Codec/Cbor.hs:(110,1)-(126,53)            9.8   22.0

throwSocketErrorIfMinus1RetryMayBlock Network.Socket.Internal                  Network/Socket/Internal.hs:(177,1)-(178,53)                         7.4    0.3

mux                                   Network.Mux.Egress                       src/Network/Mux/Egress.hs:(112,1)-(117,75)                          7.0    3.6

driverSimple                          Network.TypedProtocol.Driver.Simple      src/Network/TypedProtocol/Driver/Simple.hs:(118,1)-(142,24)         6.0    2.6

fdPreadBuf                            System.Posix.IO.ByteString               src/System/Posix/IO/ByteString.hsc:(374,1)-(383,41)                 4.2    0.3

try                                   Ouroboros.Storage.Util.ErrorHandling     src/Ouroboros/Storage/Util/ErrorHandling.hs:73:1-72                 3.7    3.6

demux                                 Network.Mux.Ingress                      src/Network/Mux/Ingress.hs:(79,1)-(92,39)                           3.7    9.0

socketAsMuxBearer                     Network.Mux.Bearer.Socket                src/Network/Mux/Bearer/Socket.hs:(51,1)-(127,86)                    2.9    1.4
@coot coot added optimisation Performance optimisation priority high byron Required for a Byron mainnet: replace the old core nodes with cardano-node labels Jan 8, 2020
@coot coot added this to the S4 2020-01-16 milestone Jan 8, 2020
@coot coot added the networking label Jan 9, 2020
@karknu
Copy link
Contributor

karknu commented Jan 9, 2020

        Thu Jan  9 12:19 2020 Time and Allocation Profiling Report  (Final)

           cardano-node +RTS -T -I0 -A128M -N2 -I0 -n64m -p -RTS --config configuration/mainnet-proxy-follower-node4.yaml --genesis-file configuration/mainnet-genesis.json --genesis-hash 5f20df933584822601f9e3f8c024eb5eb252fe8cefb24d1317dc3d432e940ebb --port 3000 --topology configuration/ipv6-topology.json --database-path db --socket-dir socket/

        total time  =      259.37 secs   (518740 ticks @ 1000 us, 2 processors)
        total alloc = 114,349,304,880 bytes  (excludes profiling overheads)

COST CENTRE                           MODULE                                      SRC                                                                  %time %alloc

send.\.\.\                            Network.Socket.ByteString.Lazy.Posix        Network/Socket/ByteString/Lazy/Posix.hs:32:16-75                      11.5    0.1
muxChannel.send                       Network.Mux                                 src/Network/Mux.hs:(224,5)-(239,53)                                    4.0    1.2
wrapBlockedIndefinitely               Control.Monad.Class.MonadSTM                src/Control/Monad/Class/MonadSTM.hs:199:1-74                           3.5    8.3
fdPreadBuf                            System.Posix.IO.ByteString                  src/System/Posix/IO/ByteString.hsc:(374,1)-(383,41)                    3.1    0.3
mux.go                                Network.Mux.Egress                          src/Network/Mux/Egress.hs:(114,3)-(117,75)                             2.2    0.9
buildStep.go                          Codec.CBOR.Write                            src/Codec/CBOR/Write.hs:(94,5)-(178,68)                                1.9    1.5
recvBuf.\                             Network.Socket.Buffer                       Network/Socket/Buffer.hsc:(134,9)-(135,60)                             1.8    0.2
atomically                            Control.Monad.Class.MonadSTM                src/Control/Monad/Class/MonadSTM.hs:135:3-56                           1.6    1.2
convertCborDecoderLBS.go.trailing'    Network.TypedProtocol.Codec.Cbor            src/Network/TypedProtocol/Codec/Cbor.hs:145:39-80                      1.5    6.7
putTMVar                              Control.Monad.Class.MonadSTM.Strict         src/Control/Monad/Class/MonadSTM/Strict.hs:132:1-55                    1.5    0.1
convertCborDecoderLBS.go.\.cs         Network.TypedProtocol.Codec.Cbor            src/Network/TypedProtocol/Codec/Cbor.hs:154:45-64                      1.3    8.3
processSingleWanton                   Network.Mux.Egress                          src/Network/Mux/Egress.hs:(130,1)-(148,45)                             1.2    1.3
throwSocketErrorIfMinus1RetryMayBlock Network.Socket.Internal                     Network/Socket/Internal.hs:(177,1)-(178,53)                            1.1    0.6
readTVar                              Control.Monad.Class.MonadSTM.Strict         src/Control/Monad/Class/MonadSTM/Strict.hs:85:1-49                     1.1    0.3
writeTVar                             Control.Monad.Class.MonadSTM.Strict         src/Control/Monad/Class/MonadSTM/Strict.hs:(88,1)-(90,25)              1.1    0.6
throwSocketErrorIfMinus1Retry         Network.Socket.Internal                     Network/Socket/Internal.hs:180:1-55                                    1.1    0.2
bytesMP                               Codec.CBOR.Write                            src/Codec/CBOR/Write.hs:(371,1)-(372,76)                               1.1    1.0
<>.\                                  Codec.CBOR.Encoding                         src/Codec/CBOR/Encoding.hs:139:49-58                                   1.0    0.7
withOpenState                         Ouroboros.Storage.ImmutableDB.Impl.State    src/Ouroboros/Storage/ImmutableDB/Impl/State.hs:(246,1)-(287,42)       1.0    1.2
toLazyByteString                      Network.TypedProtocol.Codec.Cbor            src/Network/TypedProtocol/Codec/Cbor.hs:(158,1)-(165,58)               0.9    3.8
instructionHelper                     Ouroboros.Storage.ChainDB.Impl.Reader       src/Ouroboros/Storage/ChainDB/Impl/Reader.hs:(231,1)-(363,38)          0.8    1.4
encodeMuxSDU.hdr                      Network.Mux.Codec                           src/Network/Mux/Codec.hs:28:7-26                                       0.8    8.6
iteratorNextImpl.\                    Ouroboros.Storage.ImmutableDB.Impl.Iterator src/Ouroboros/Storage/ImmutableDB/Impl/Iterator.hs:(374,9)-(386,57)    0.6    1.1
pread.\                               Ouroboros.Storage.IO                        src-unix/Ouroboros/Storage/IO.hs:(103,5)-(104,83)                      0.2    1.7
p_string.p_char                       Text.JSON.Canonical.Parse                   Text/JSON/Canonical/Parse.hs:(181,9)-(182,60)                          0.1    1.2
fromCompactRedeemVerificationKey.bs   Cardano.Crypto.Signing.Redeem.Compact       src/Cardano/Crypto/Signing/Redeem/Compact.hs:(81,3)-(82,46)            0.1    2.0

Previous profile was from an Arm system, this is from an x86 vm.

@karknu
Copy link
Contributor

karknu commented Jan 9, 2020

From the profiling one can see that it is the actual sending of the data that is consuming CPU. send.\.\.\ is the mux thread doing to actual socket call and muxChannel.send is either chainsync or blockfetch responder waiting the mux thread to finish so that they can send more data.

We have an excessive amount of syscalls for every SDU we write, getSocketName and getSocketOption and the actual send (which is really writev).

getSocketName and getSocketOption is used to dynamically decide a maximum mux segment size and switching to static one (I tried with 13000) yields a 10% speed improvement.

However even with a segment size limit of 13000 bytes 99% of all messages will end up being far smaller, say around 700 bytes for a single empty block. By coalescing messages from the same miniprotocol while they are waiting for service we can reduce the number of write calls even further. Doing this improves the speed by 20%. At this point the client appears to be the bottle neck.

@karknu
Copy link
Contributor

karknu commented Jan 9, 2020

        Thu Jan  9 15:45 2020 Time and Allocation Profiling Report  (Final)

           cardano-node +RTS -T -I0 -A128M -N2 -I0 -n64m -pa -RTS --config configuration/mainnet-proxy-follower-node4.yaml --genesis-file configuration/mainnet-genesis.json --genesis-hash 5f20df933584822601f9e3f8c024eb5eb252fe8cefb24d1317dc3d432e940ebb --port 3000 --topology configuration/ipv6-topology.json --database-path db --socket-dir socket/

        total time  =      637.92 secs   (1275850 ticks @ 1000 us, 2 processors)
        total alloc = 108,139,507,952 bytes  (excludes profiling overheads)

COST CENTRE                                                                    MODULE                                                        SRC                                                                                                                                                     %time %alloc  ticks     bytes

IDLE                                                                           IDLE                                                          <built-in>                                                                                                                                               67.4    0.0  860414         0
OVERHEAD_of                                                                    PROFILING                                                     <built-in>                                                                                                                                                5.9    6.5  75048 7050400296
send.\.\.\                                                                     Network.Socket.ByteString.Lazy.Posix                          Network/Socket/ByteString/Lazy/Posix.hs:32:16-75                                                                                                          2.0    0.0  24906  51517648
muxChannel.send                                                                Network.Mux                                                   src/Network/Mux.hs:(224,5)-(249,53)                                                                                                                       1.7    1.8  21279 1969767424
fdPreadBuf                                                                     System.Posix.IO.ByteString                                    src/System/Posix/IO/ByteString.hsc:(374,1)-(383,41)                                                                                                       0.9    0.3  11226 295045472
mux.go                                                                         Network.Mux.Egress                                            src/Network/Mux/Egress.hs:(114,3)-(117,75)                                                                                                                0.7    0.5   8945 553005008
wrapBlockedIndefinitely                                                        Control.Monad.Class.MonadSTM                                  src/Control/Monad/Class/MonadSTM.hs:199:1-74                                                                                                              0.6    6.6   7886 7181204976
GC                                                                             GC                                                            <built-in>                                                                                                                                                0.6    0.0   7632    113216
buildStep.go                                                                   Codec.CBOR.Write                                              src/Codec/CBOR/Write.hs:(94,5)-(178,68)                                                                                                                   0.5    1.7   6543 1790538672
atomically                                                                     Control.Monad.Class.MonadSTM                                  src/Control/Monad/Class/MonadSTM.hs:135:3-56                                                                                                              0.5    1.2   5755 1327553992
writeTVar                                                                      Control.Monad.Class.MonadSTM.Strict                           src/Control/Monad/Class/MonadSTM/Strict.hs:(88,1)-(90,25)                                                                                                 0.4    0.6   5515 679959160
convertCborDecoderLBS.go.\.cs                                                  Network.TypedProtocol.Codec.Cbor                              src/Network/TypedProtocol/Codec/Cbor.hs:154:45-64                                                                                                         0.4    6.7   5173 7193101912
SYSTEM                                                                         SYSTEM                                                        <built-in>                                                                                                                                                0.4    0.1   5037  59155328
readTVar                                                                       Control.Monad.Class.MonadSTM.Strict                           src/Control/Monad/Class/MonadSTM/Strict.hs:85:1-49                                                                                                        0.4    0.3   4938 330777008
bytesMP                                                                        Codec.CBOR.Write                                              src/Codec/CBOR/Write.hs:(371,1)-(372,76)                                                                                                                  0.4    1.0   4800 1115143232
convertCborDecoderLBS.go.trailing'                                             Network.TypedProtocol.Codec.Cbor                              src/Network/TypedProtocol/Codec/Cbor.hs:145:39-80                                                                                                         0.4    5.4   4543 5831300976
toLazyByteString                                                               Network.TypedProtocol.Codec.Cbor                              src/Network/TypedProtocol/Codec/Cbor.hs:(158,1)-(165,58)                                                                                                  0.3    4.1   4039 4416897992
throwSocketErrorIfMinus1RetryMayBlock                                          Network.Socket.Internal                                       Network/Socket/Internal.hs:(177,1)-(178,53)                                                                                                               0.3    0.5   3951 547585952
<>.\                                                                           Codec.CBOR.Encoding                                           src/Codec/CBOR/Encoding.hs:139:49-58                                                                                                                      0.3    0.7   3789 771407888
recvBuf.\                                                                      Network.Socket.Buffer                                         Network/Socket/Buffer.hsc:(134,9)-(135,60)                                                                                                                0.3    0.2   3478 195192776
iteratorNextImpl.\.\                                                           Ouroboros.Storage.ImmutableDB.Impl.Iterator                   src/Ouroboros/Storage/ImmutableDB/Impl/Iterator.hs:(374,40)-(386,57)            

Profile with static max segment size and coalescing messages as described above

iohk-bors bot added a commit that referenced this issue Jan 10, 2020
1434: Perfermance improvements for sending. r=karknu a=karknu

Implements performance improvements discussed in #1420 .

Co-authored-by: Karl Knutsson <[email protected]>
@karknu karknu closed this as completed Jan 10, 2020
coot pushed a commit that referenced this issue May 16, 2022
1434: Perfermance improvements for sending. r=karknu a=karknu

Implements performance improvements discussed in #1420 .

Co-authored-by: Karl Knutsson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
byron Required for a Byron mainnet: replace the old core nodes with cardano-node optimisation Performance optimisation
Projects
None yet
Development

No branches or pull requests

2 participants