Skip to content
This repository has been archived by the owner on Dec 2, 2024. It is now read-only.

Testnet Integration test Plutus APP does not start #189

Closed
volodyad opened this issue Dec 12, 2021 · 25 comments
Closed

Testnet Integration test Plutus APP does not start #189

volodyad opened this issue Dec 12, 2021 · 25 comments
Assignees
Labels
bug Something isn't working

Comments

@volodyad
Copy link

volodyad commented Dec 12, 2021

Summary

On running integration test PAB start takes forever,
used the latest main 46e831e

Steps to reproduce the behavior

Run the tesnet example

Actual Result

I have been waiting around 3 -4 hours and got only. tried several times

Current block: 100000. Current slot: 27243687
Current block: 200000. Current slot: 33116605

CPU usage is excessive
Screenshot 2021-12-12 at 17 56 47

Expected Result

PAB starts succesffully

Describe the approach you would take to fix this

No response

System info

mac os big sur
11.5.2
16 GB
2,6 GHz 6-Core Intel Core i7

46e831e

@volodyad volodyad added the bug Something isn't working label Dec 12, 2021
@volodyad volodyad changed the title Plutus APP does not start Testnet Integration test Plutus APP does not start Dec 12, 2021
@raduom
Copy link
Contributor

raduom commented Dec 13, 2021

I am working on offering the possibility to synchronise the PAB starting with a given blockid. Unless your contract needs access to historical data, that should help a lot with the time required for synchronisation.

@raduom
Copy link
Contributor

raduom commented Dec 13, 2021

There will also be a fix that will allow you to specify the maximum rollback that you want to do which will help with the memory consumption and CPU usage (which is due to some problem with the parallel GC in Haskell).

@raduom
Copy link
Contributor

raduom commented Dec 13, 2021

If you want to reduce the CPU usage you can specify the following runtime options for your Haskell binary: +RTS -gq -I0. The option -gq will turn off the parallel garbage collector (reducing the CPU usage to at most 100%) and -I0 turns off IDLE GC which will make the CPU usage go down when fully synchronised.

@volodyad
Copy link
Author

volodyad commented Dec 13, 2021

Hello @raduom , two weeks ago sync took about 20 minutes, was there any changes which caused sync time to increase?

@luigy
Copy link
Contributor

luigy commented Dec 13, 2021

@volodyad reverting #174 brought the time to sync back down to numbers I was seeing 2 weeks ago

I am working on offering the possibility to synchronise the PAB starting with a given blockid.

@raduom Nice! this will certainly save me time :). Currently have to wait from 8 to 20 min to test my changes

@raduom
Copy link
Contributor

raduom commented Dec 13, 2021

Honestly the 20 minutes for full synchronisation is a problem that we intend to solve by providing some sort of checkpoint-ing (saving state as we process the chain). However, that is not really on the current sprint and we don't know exactly when we will be able to schedule it. We hope that the option to synchronise given a certain block id will be an acceptable workaround until we get to implement persistent state for the PAB.

@volodyad
Copy link
Author

volodyad commented Dec 13, 2021

@luigy , it helped, thank you

@raduom issue is that on the main branch it does not start all, I have been waiting for hours.

@mikekeke
Copy link
Contributor

I see same issue for this commit while trying to start PAB for my contract. Didn't noticed big CPU usage, but noticed big RAM usage (over 7 Gb) - had to extend swap so the system doesn't kill PAB. After swap was extended, waited for an hour to sync but console log stayed at Starting PAB backend server on port 9080

@mikekeke
Copy link
Contributor

@luigy Can you tell if that issue affects only chain-index or plutus-pab too? Like, can I use latest commit for PAB executsable and build just chain-index from older one?

@raduom
Copy link
Contributor

raduom commented Dec 14, 2021

What network are you trying to synchronise with?

@mikekeke
Copy link
Contributor

I was trying public testnet (magic 1097911063)

@vlasin
Copy link

vlasin commented Dec 15, 2021

I have a very similar experience to what was described above (Ubuntu 21.10). For me, the PAB integration test (and my own pab program) breaks exactly at December 7 commit (46527f2). I have tried 3 scenarios:

  1. Use the previous commit for both plutus-chain-index and plutus-pab-examples -> the test runs successfully.
  2. Use this commit for plutus-chain-index and the previous one for plutus-pab-examples -> the synchronization takes 5-10 minutes as usual, but the PAB is killed due to out-of-memory error (16 GB RAM).
  3. Use this commit for both plutus-chain-index and plutus-pab-examples -> I start getting synchronization messages after an hour+ but then an out-of-memory error before fully synchronizing. Also tried this on another PC (Ubuntu 20.04, 32 GB RAM) with the same result.

In all tests, I used cardano-node v1.30.1 and cardano-wallet v2021-11-11.

Probably, something needs adjusting in the PAB code after this particular commit?

@raduom
Copy link
Contributor

raduom commented Dec 15, 2021

Fixing the memory leaks caused the RAM usage to go up (now you no longer have unevaluated thunks, you have the real deal). I tested the PAB with the shelly_qa test net and it used around 8Gb to synchronise. I would really advise you to use that testnet.

I considered not fixing this issue (since I knew that it would temporarily increase RAM usage) however certain queries would cause the unevaluated thunks to partially evaluate which caused a much worse memory usage explosion making the PAB's memory consumption unpredictable, which in my opinion is worse.

That said, I am working on a PR that will land today or tomorrow that allows you to specify in the command line (for the PAB) how much history to you want to retain. If you don't want to test rollbacks then you can set the history to 1 block which should help with the memory usage (and since the CPU usage is connected to the memory usage it should help with that too).

If you are in a rush you can temporarily change the 500 value to 1 here: https://github.com/input-output-hk/plutus-apps/blob/835ce24b7c89fa0d49e985029defc15b6732785e/plutus-chain-index-core/src/Plutus/ChainIndex/UtxoState.hs#L132

There was another question about sync time increasing (@luigi). The memory leaks were due to unevaluated thunks, which now we fully evaluate. That is probably why it takes a bit longer.

I would also suspect that the huge sync times / failures are due to GHC's parallel GC going berserk under pressure. You can test that by turning off the parallel GC using the +RTS -qg -I0 arguments for the PAB binary.

@volodyad
Copy link
Author

volodyad commented Dec 15, 2021

I have used +RTS -qg -I0
and changed logs seetings to show more
if (smod 10_000 == 0 && s > 0) || (s >= recentSlot)

in UtxoState reverted state to previous, without trimIndex
let (before, after) = FT.split ((s <=) . snd) ix

it got stuck around , sync is very slow , every 10000 around a minute

Current block: 121107. Current slot: 28070000
  Current block: 125566. Current slot: 28250000
Current block: 127509. Current slot: 28330000

at this slot cardano-node CPU not used anymore, on prev steps it was around 90%
plutus app CPU utilization around 100%

@raduom
Copy link
Contributor

raduom commented Dec 15, 2021

There is little chance of good things happening without trimIndex. Give me a bit of time and I will have a PR addressing these memory issues (I suspect that the slowness is due to increased pressure on the GC, but we'll see more when I get there).

@volodyad
Copy link
Author

It worked faster before #174m what else could affect this?

@raduom
Copy link
Contributor

raduom commented Dec 15, 2021

Please read my previous answer. Specifically:

There was another question about sync time increasing (@luigi). The memory leaks were due to unevaluated thunks, which now we fully evaluate. That is probably why it takes a bit longer.

@raduom
Copy link
Contributor

raduom commented Dec 15, 2021

The code here, should work fine if you want to test it: #191
I managed to synchronise with shelley_qa (43kk blocks) while using only 6Gb RAM. I still need to test it on the testnet, but the current changes look promising.

It still takes quite a bit of time though..

@vlasin
Copy link

vlasin commented Dec 16, 2021

Thanks, @raduom. After changing the trimIndex parameter, it worked for me. And the memory usage seems more predictable indeed. Although, it took a few hours to synchronize. I will be waiting on more synchronization options for the PAB.

@raduom
Copy link
Contributor

raduom commented Dec 16, 2021

I would run it with --rollback-history 1 if you don't need any rollbacks. Both memory usage and synchronisation speed are improved.

@volodyad
Copy link
Author

volodyad commented Dec 17, 2021

tried --rollback-history 1
20 minutes and slow down the Current block: 17822. Current slot: 6980000

Actually what kind of data is loaded? As a workardound can we setup start slot to start sync?
When you just do the testing with a new wallet and new contract and not going to use older utxos, can I just skip loading everything?

@raduom
Copy link
Contributor

raduom commented Dec 17, 2021

There is a bug in the rollback-history code that I will get to fix today, probably.

Actually what kind of data is loaded? As a workardound can we setup start slot to start sync?

Yes. Probably at the beginning of the next week.

@volodyad
Copy link
Author

There is a bug in the rollback-history code that I will get to fix today, probably.

Please let me know , if any success, so I could try

@raduom
Copy link
Contributor

raduom commented Dec 20, 2021

You can try out: #210

@raduom raduom self-assigned this Dec 22, 2021
@raduom
Copy link
Contributor

raduom commented Dec 22, 2021

If there is no further feedback on this issue, I would like to close it.

@raduom raduom closed this as completed Jan 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants