Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential seq halt risk #3630

Open
giskook opened this issue May 15, 2024 · 7 comments
Open

Potential seq halt risk #3630

giskook opened this issue May 15, 2024 · 7 comments
Assignees

Comments

@giskook
Copy link

giskook commented May 15, 2024

System information

zkEVM Node version: v0.6.7
OS & Version: Linux
Commit hash : (-)
Network: Mainnet/Testnet

Expected behaviour

Actual behaviour

X Layer(node 0.6.7) under load test will halt sequencer

Steps to reproduce the behaviour

Under load test, when create a new l2block, because the prev lock is not proccessed, so the new l2block may halt the sequencer.

Backtrace

node's log (X Layer's, but the logical is the same)

/src/log/log.go:142 github.com/0xPolygonHermez/zkevm-node/log.appendStackTraceMaybeArgs()
/src/log/log.go:251 github.com/0xPolygonHermez/zkevm-node/log.Errorf()
/src/sequencer/finalizer.go:957 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).Halt()
/src/sequencer/l2block.go:646 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).openNewWIPL2Block()
/src/sequencer/l2block.go:529 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).finalizeWIPL2Block()
/src/sequencer/finalizer.go:399 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).finalizeBatches()
/src/sequencer/finalizer.go:186 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).Start()
)","pid":1,"version":"v0.3.9-5-gbcd78f1b","stacktrace":"github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).Halt
	/src/sequencer/finalizer.go:957
github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).openNewWIPL2Block
	/src/sequencer/l2block.go:646
github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).finalizeWIPL2Block
	/src/sequencer/l2block.go:529
github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).finalizeBatches
	/src/sequencer/finalizer.go:399
github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).Start
	/src/sequencer/finalizer.go:186"

or

failed to execute new wip L2 block [274], error: key not found in the database 
/src/log/log.go:142 github.com/0xPolygonHermez/zkevm-node/log.appendStackTraceMaybeArgs()
/src/log/log.go:251 github.com/0xPolygonHermez/zkevm-node/log.Errorf()
/src/sequencer/finalizer.go:957 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).Halt()
/src/sequencer/l2block.go:646 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).openNewWIPL2Block()
/src/sequencer/batch.go:206 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).finalizeWIPBatch()
/src/sequencer/finalizer.go:470 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).finalizeBatches()
/src/sequencer/finalizer.go:186 github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).Start()
)","pid":1,"version":"v0.3.9-1-g01328c71","stacktrace":"github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).Halt
	/src/sequencer/finalizer.go:957
github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).openNewWIPL2Block
	/src/sequencer/l2block.go:646
github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).finalizeWIPBatch
	/src/sequencer/batch.go:206
github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).finalizeBatches
	/src/sequencer/finalizer.go:470
github.com/0xPolygonHermez/zkevm-node/sequencer.(*finalizer).Start
	/src/sequencer/finalizer.go:186"

executor's log

20240513_044417_037003 2999306 df7f640 ExecutorServiceImpl::ProcessBatchV2() got sequencerAddr=d6dda5aa7749142b7fda3fe4662c9f346101b8a6 batchL2DataLength=9 batchL2Data=0x0b0000000300000000...0b0000000300000000 oldStateRoot=3f39566612c98487a4cb091740077f56aff652512fe5401bf7c2d25b4c925fdd oldAccInputHash=0 oldBatchNum=573763 chainId=195 forkId=9 generated globalExitRoot=0 timestampLimit=1715575457 from=0x bUpdateMerkleTree=0 bNoCounters=0 bGetKeys=0 bSkipVerifyL1InfoRoot=1 bSkipFirstChangeL2Block=0 bSkipWriteBlockInfoRoot=1 traceConfig=bEnabled=0,bDisableStorage=0,bDisableStack=0,bEnableMemory=0,bEnableReturnData=0,txHashToGenerateFullTrace= UUID=f83e4c2b-e84b-42b7-9f62-00dfcc803642 gasLimit=0 l1InfoTreeData.size=1=120975, stateOverride.size=0

20240513_044419_537014 2999306 df7f640 zkWarning: Database::read() failed calling readRemote() with error=ZKR_DB_KEY_NOT_FOUND; will retry after 100000us key=59f00c7c75346d1b11c70e26613ad99ffbe59f486e63bdcd6b4e25aaac3d7a93 i=9

20240513_044419_662293 2999306 df7f640 zkError: MainExecutor::logError() proverRequest.result=ZKR_DB_KEY_NOT_FOUND step=0 eval=153 zkPC=11880 rom.line={ inFREE=1 inFREE0=0 freeInTag={} jmpAddr=13163 call=1 useJmpAddr=1 setSR=1 sWR=1 fileName=process-change-l2-block.zkasm line=36 lineStr=$=> SR:SSTORE, CALL(initBlockInfoTree)} uuid=f83e4c2b-e84b-42b7-9f62-00dfcc803642

When submitting logs: please submit them as text and not screenshots.

@giskook
Copy link
Author

giskook commented May 15, 2024

fixed pr: #3629

@giskook
Copy link
Author

giskook commented May 17, 2024

It seems increase sequencer's executor's dbMTCacheSize can cover this situation.

@giskook
Copy link
Author

giskook commented May 17, 2024

Only happend when sequencer set SequentialProcessL2Block to false

@giskook
Copy link
Author

giskook commented May 18, 2024

I think from this pr: #3604
sequencer may halt under loadtest situation,
root cause:
wip batch (tx-by-tx process) will build his state on executor's dbMTCache, but dbMTCache is a LRU, so after StateRootSyncInterval time running, old state may dropped by executor. Then when sequencer open a new L2block, it may fail because of missing key.

Way to solve:

  • increase executor's dbMTCacheSize
  • decrease the StateRootSyncInterval

X Layer testnet may halt following below configuration:
sequencer

SequentialProcessL2Block = false
Sequencer.Finalizer.StateRootSyncInterval = "3600s"

executor

    "dbMTCacheSize": 1024,

@agnusmor
Copy link
Contributor

agnusmor commented May 21, 2024

Hi, this is a known issue with the associative cache in executor/prover, to fix it you need to disable it (useAssociativeCache = false) in executor configuration. The prover team is working to fix the issue.

@agnusmor agnusmor self-assigned this May 22, 2024
@giskook
Copy link
Author

giskook commented May 28, 2024

useAssociativeCache

Our executor do not set useAssociativeCache, after check the prover's code, i think it's default to false.

@giskook
Copy link
Author

giskook commented May 28, 2024

Executor‘s configure

{
    "runExecutorServer": true,
    "runExecutorClient": false,
    "runExecutorClientMultithread": false,

    "runHashDBServer": true,
    "runHashDBTest": false,

    "runAggregatorServer": false,
    "runAggregatorClient": false,

    "runFileGenBatchProof": false,
    "runFileGenAggregatedProof": false,
    "runFileGenFinalProof": false,
    "runFileProcessBatch": false,
    "runFileProcessBatchMultithread": false,
    "runFileExecutor": false,

    "runKeccakScriptGenerator": false,
    "runKeccakTest": false,
    "runStorageSMTest": false,
    "runBinarySMTest": false,
    "runMemAlignSMTest": false,
    "runSHA256Test": false,
    "runBlakeTest": false,

    "executeInParallel": true,
    "useMainExecGenerated": true,
    "useProcessBatchCache": false,

    "saveRequestToFile": false,
    "saveInputToFile": false,
    "saveDbReadsToFile": false,
    "saveDbReadsToFileOnChange": false,
    "saveOutputToFile": false,
    "saveProofToFile": false,
    "saveResponseToFile": false,
    "saveFilesInSubfolders": false,

    "loadDBToMemCache": false,
    "loadDBToMemCacheInParallel": false,
    "loadDBToMemTimeout": 30000000,
    "dbMTCacheSize": 1024,
    "dbProgramCacheSize": 1024,

    "opcodeTracer": false,
    "logRemoteDbReads": false,
    "logExecutorServerInput": false,
    "logExecutorServerInputGasThreshold": 1048576,
    "logExecutorServerResponses": false,
    "logExecutorServerTxs": false,

    "executorServerPort": 50071,
    "executorROMLineTraces": false,
    "executorTimeStatistics": false,
    "executorClientPort": 50071,
    "executorClientHost": "127.0.0.1",
    "executorClientLoops": 1,

    "hashDBServerPort": 50061,
    "hashDBURL": "local",

    "aggregatorServerPort": 50081,
    "aggregatorClientPort": 50081,
    "aggregatorClientHost": "127.0.0.1",
    "aggregatorClientMockTimeout": 10000000,
    "aggregatorClientWatchdogTimeout": 60000000,

    "mapConstPolsFile": false,
    "mapConstantsTreeFile": false,

    "inputFile": "testvectors/performance/121_tx_input.json",
    "inputFile2": "testvectors/aggregatedProof/recursive1.zkin.proof_1.json",

    "outputPath": "runtime/output",
    "configPath": "config",

    "zkevmCmPols_disabled": "runtime/zkevm.commit",
    "zkevmCmPolsAfterExecutor_disabled": "runtime/zkevm.commit",
    "c12aCmPols": "runtime/c12a.commit",
    "recursive1CmPols_disabled": "runtime/recursive1.commit",
    "recursive2CmPols_disabled": "runtime/recursive2.commit",
    "recursivefCmPols_disabled": "runtime/recursivef.commit",
    "finalCmPols_disabled": "runtime/final.commit",

    "publicsOutput": "public.json",
    "proofFile": "proof.json",

    "databaseURL": "postgresql://xgon_prover_user:xxxxx@xgon-state-pap.cluster-cpvlnksujjmh.ap-east-1.rds.amazonaws.com:5432/xgon_prover_db",
    "dbNodesTableName": "state.nodes",
    "dbProgramTableName": "state.program",
    "dbMultiWrite": true,
    "dbFlushInParallel": false,
    "dbConnectionsPool": true,
    "dbNumberOfPoolConnections": 506,
    "dbMetrics": false,
    "dbClearCache": false,
    "dbGetTree": true,
    "dbReadRetryDelay": 100000,
    "cleanerPollingPeriod": 600,
    "requestsPersistence": 3600,
    "maxExecutorThreads": 100,
    "maxProverThreads": 100,
    "maxHashDBThreads": 200
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants