Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync times are significantly slower in tags/1.4.0 #488

Closed
oneEdoubleD opened this issue Jan 21, 2020 · 10 comments
Closed

Sync times are significantly slower in tags/1.4.0 #488

oneEdoubleD opened this issue Jan 21, 2020 · 10 comments
Assignees
Labels
bug Something isn't working priority high issues/PRs that MUST be addressed. The release can't happen without this;
Milestone

Comments

@oneEdoubleD
Copy link
Contributor

Issue

Sync times are significantly slower than in tags/1.4.0. Across four tests and seen by two different people, a complete sync can take upwards of 10 hours. Right now I am at epoch 44 after 3 hours.

We saw much better performance with 1.3.0, with several syncs taking less than 3 hours.

Note that these tests are being ran on remote nodes. Due to input-output-hk/cardano-byron-proxy#87 we are unable to test against a local proxy that normally results in some improvements (but still not enough to explain the regression here).

@karknu
Copy link
Contributor

karknu commented Jan 21, 2020

You should be able to test against a local cardano-node to get around the byron-proxy bug.

@oneEdoubleD
Copy link
Contributor Author

You should be able to test against a local cardano-node to get around the byron-proxy bug.

True! I can set a test up to run overnight, however it's still 5/6 hours longer than syncing without a local proxy in 1.3.0. I'm currently syncing without tracers to see if that helps at all.

@vhulchenko-iohk vhulchenko-iohk added the bug Something isn't working label Jan 21, 2020
@oneEdoubleD oneEdoubleD added this to the S5 2020-01-30 milestone Jan 22, 2020
@karknu
Copy link
Contributor

karknu commented Jan 22, 2020

@oneEdoubleD What are the specs on the machine you're running the tests on? What is the CPU load on it?

@mrBliss
Copy link
Contributor

mrBliss commented Jan 22, 2020

Is this using -N2 (or more)? How "remote" and loaded is the node? Is the network or the CPU, on the client or the server, the bottleneck?

@karknu
Copy link
Contributor

karknu commented Jan 22, 2020

I've measured the sync speed (as Mbyte written to db) during 10 minutes and there isn't a significant decrease in performance.
Syncing towards a byron-proxy running on the same LAN, the client is a Rockspi ARM. The client is has 100% CPU load for the duration of the test, which is expected.
Test where only run with -N1 due an SMP bug on Arm.

RTS Args 1.3 1.4
-N1 -A128M 75M 75M
-N1 55M 50M

@i-o-m i-o-m added the priority high issues/PRs that MUST be addressed. The release can't happen without this; label Jan 23, 2020
@oneEdoubleD
Copy link
Contributor Author

These were my build steps - they're very 'vanilla':

git clone https://github.com/input-output-hk/cardano-node
cd cardano-node
nix-build -A scripts.staging.node
./result

@ArturWieczorek
Copy link
Contributor

I used exactly the same commands as @oneEdoubleD to build node.

branch: master
environment: staging

168 epochs synced in 7 hours and 11 minutes.

I am running it inside VM on my desktop PC:
VM has allocated 14GB of RAM and 4 logical processors (Host has 12 Logical Processors)

The CPU load was around 25-28%

ps aux | grep node

nix/store/31rvi6ilf7baspi03fj22qgjldsq5nsa-cardano-node-exes/bin/cardano-node --genesis-file /nix/store/ciwax7jcb8g22kr90vy8vax4bfx0r450-mainnet-genesis-dryrun-with-stakeholders.json --genesis-hash c6a004d3d178f600cd8caa10abbebe1549bef878f0665aea2903472d5abf7323 --config /nix/store/46df1j1m854ap3j0sabjq8ggxwifv1hn-config-0.json --database-path .//db-staging --socket-dir .//socket --topology /nix/store/nk2rzyc1zppv06rg8583p48mn3r5b84l-topology.yaml --host-addr 127.0.0.1 --port 3001

@CodiePP
Copy link
Contributor

CodiePP commented Jan 28, 2020

I think this "bug" is related to the performance issue we saw when syncing from multiple nodes at the same time. The topology used by the nix script is targeting four nodes.

@disassembler
Copy link
Contributor

While I agree @CodiePP that may help, it's the same topology we have used for the last 3 releases. So that does not explain the regression.

karknu added a commit that referenced this issue Jan 29, 2020
iohk-bors bot added a commit to IntersectMBO/ouroboros-network that referenced this issue Jan 29, 2020
1525: Lower maxConcurrencyBulkSync to 1 r=karknu a=karknu

Concurrent bulk downloads are currently very costly.
Dissable them for now.
Part of a workaround for IntersectMBO/cardano-node#488

Co-authored-by: Karl Knutsson <[email protected]>
iohk-bors bot added a commit to IntersectMBO/ouroboros-network that referenced this issue Jan 29, 2020
1525: Lower maxConcurrencyBulkSync to 1 r=karknu a=karknu

Concurrent bulk downloads are currently very costly.
Dissable them for now.
Part of a workaround for IntersectMBO/cardano-node#488

Co-authored-by: Karl Knutsson <[email protected]>
iohk-bors bot added a commit to IntersectMBO/ouroboros-network that referenced this issue Jan 30, 2020
1525: Lower maxConcurrencyBulkSync to 1 r=karknu a=karknu

Concurrent bulk downloads are currently very costly.
Dissable them for now.
Part of a workaround for IntersectMBO/cardano-node#488

Co-authored-by: Karl Knutsson <[email protected]>
dcoutts pushed a commit that referenced this issue Jan 31, 2020
@oneEdoubleD
Copy link
Contributor Author

Now that @karknu's confirmed fix has been merged into Master, and we're now on 1.5.0, I'll close this issue. The node is syncing significantly faster - completing in ~1 hour.

#506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority high issues/PRs that MUST be addressed. The release can't happen without this;
Projects
None yet
Development

No branches or pull requests

8 participants