Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue124 #145

Merged
merged 4 commits into from
Mar 7, 2018
Merged

Issue124 #145

merged 4 commits into from
Mar 7, 2018

Conversation

AionJayT
Copy link
Collaborator

@AionJayT AionJayT commented Mar 7, 2018

fix issue
#124

Due to the synchronized method cause the deadlock between the pendingTxImpl and the txpool and then the event manager callback function get stuck.

Note. each handler of the event manager is a single thread. the callback execute function don't need the synchronized term unless there is another thread calling the same method.

@AionJayT AionJayT requested review from aion-jin and iamyulong March 7, 2018 19:52
@iamyulong
Copy link
Contributor

iamyulong commented Mar 7, 2018

Could you please elaborate which methods cause the deadock, for which threads?

I'm assuming a deadlock model like this:

T1: calls pendingTxImpl's synchronized method, which calls a synchronized method of txpool
T2: calls txpool's synchronized method, which calls a synchronized method of pendingTxImpl

@AionJayT
Copy link
Collaborator Author

AionJayT commented Mar 7, 2018

New block Sealed -> OnBest event -> (BlockHandler thread, T1) callback-> createNewBlockTemplate -> getPendingTransactions -> txpool.snapshot.

pow thread (AionPow, T2) -> createNewBlockTemplate -> getPendingTransactions -> txpool.snapshot.

@AionJayT
Copy link
Collaborator Author

AionJayT commented Mar 7, 2018

EquiHash miner (multi tread here!) -> OnSolution event -> (Consensus thread) callback -> event been process by the order of the event queue in the Consensus handler -> processSolution -> import block -> if the best block -> clearPandingTx -> txpool.remove

@AionJayT
Copy link
Collaborator Author

AionJayT commented Mar 7, 2018

Found one Java-level deadlock:
=============================
"pool-5-thread-11":
  waiting to lock monitor 0x00007f7a8c02b900 (object 0x00000006c77ca150, a org.aion.zero.impl.blockchain.AionPendingStateImpl),
  which is held by "BlkHdr"
"BlkHdr":
  waiting to lock monitor 0x00007f7b0000d480 (object 0x00000006c77ca0f0, a org.aion.zero.impl.blockchain.NonceMgr),
  which is held by "pool-5-thread-11"

Java stack information for the threads listed above:
===================================================
"pool-5-thread-11":
	at org.aion.zero.impl.blockchain.AionPendingStateImpl.bestNonceSet(AionPendingStateImpl.java:583)
	- waiting to lock <0x00000006c77ca150> (a org.aion.zero.impl.blockchain.AionPendingStateImpl)
	at org.aion.zero.impl.blockchain.NonceMgr.getNonceAndAdd(NonceMgr.java:84)
	- locked <0x00000006c77ca0f0> (a org.aion.zero.impl.blockchain.NonceMgr)
	at org.aion.api.server.ApiAion.getTxNonce(ApiAion.java:498)
	- locked <0x00000006c7400948> (a org.aion.api.server.pb.ApiAion0)
	at org.aion.api.server.ApiAion.getTxNonce(ApiAion.java:494)
	- locked <0x00000006c7400948> (a org.aion.api.server.pb.ApiAion0)
	at org.aion.api.server.ApiAion.sendTransaction(ApiAion.java:454)
	at org.aion.api.server.pb.ApiAion0.process(ApiAion0.java:441)
	at org.aion.api.server.zmq.HdlrZmq.process(HdlrZmq.java:61)
	at org.aion.api.server.zmq.ProtocolProcessor.workerRun(ProtocolProcessor.java:265)
	at org.aion.api.server.zmq.ProtocolProcessor.lambda$run$3(ProtocolProcessor.java:110)
	at org.aion.api.server.zmq.ProtocolProcessor$$Lambda$58/1836303785.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1167)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:641)
	at java.lang.Thread.run([email protected]/Thread.java:844)
"BlkHdr":
	at org.aion.zero.impl.blockchain.NonceMgr.flush(NonceMgr.java:132)
	- waiting to lock <0x00000006c77ca0f0> (a org.aion.zero.impl.blockchain.NonceMgr)
	at org.aion.zero.impl.blockchain.AionPendingStateImpl.processBest(AionPendingStateImpl.java:412)
	- locked <0x00000006c77ca150> (a org.aion.zero.impl.blockchain.AionPendingStateImpl)
	at org.aion.zero.impl.blockchain.AionPendingStateImpl$1.onBest(AionPendingStateImpl.java:170)
	at org.aion.zero.impl.blockchain.AionPendingStateImpl$1.onBest(AionPendingStateImpl.java:168)
	at org.aion.evtmgr.impl.handler.BlockHandler.dispatch(BlockHandler.java:63)
	at org.aion.evtmgr.impl.abs.AbstractHandler$1.run(AbstractHandler.java:66)

Found 1 deadlock.

@iamyulong
Copy link
Contributor

iamyulong commented Mar 7, 2018

Awesome!

Let's focus on fixing the deadlock between NonceMgr and AionPendingStateImpl.

Copy link
Contributor

@iamyulong iamyulong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removes synchronized keywords, because the methods are called by single thread. Needs to be careful when adding another invoke in the future

@iamyulong iamyulong merged commit 6d200fe into dev Mar 7, 2018
@AionJayT AionJayT deleted the issue124 branch April 6, 2018 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants