-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quorum-Raft fall into leader change loop in stress test #927
Comments
@WesleyLiu0717 can you share more details about how you were sending transactions ? |
Hi @amalrajmani, The code has been pushed in the following repository: Two files are needed: {
"coinbase": "0xtestAddress",
"serv": {
"hosts": [
"ip1",
"ip2",
"ip3",
"ip4"
],
"rpc_port": "rpcport",
"ws_port": "wsport"
},
"signature": {
"pass_phrase": "YourPassWord"
},
"sol_settings": {
"compiler_version": "v0.4.25+commit.59dbf8f1"
},
"smart_contract": {
"abi": "contract abi",
"address": "contract address",
"bytecode": "contract bytecode",
"document": {
"id_beginning": 10000
},
"employee": {
"id_beginning": 1012200,
"overall_amount": 1000
}
}
} privateKeys.json: [
{
"address": "testAddress1",
"private_key": "testPrivateKey1"
},
{
"address": "testAddress2",
"private_key": "testPrivateKey2"
}
] (You need to prepare 1000 accounts for 1000 Tx/s)
|
@WesleyLiu0717 thanks for sharing all the information. We were doing some simulations in our stress test environment. Though there were leader elections, none of these loops were beyond 100. However from the logs shared I notice that the leader election has run for several hours in your case before the network started processing transactions again. From the logs I also notice that node4 initiates the leader election and the other nodes respond immediately. However, the response is not processed by node4 which results in another leader election loop. There is few hours delay in processing the response from other nodes which has resulted in this loop. |
@WesleyLiu0717 we have been able to reproduce the issue at our end and we are checking for possible solutions. Will revert back. Thanks for your patience. |
@vsmk98 Thanks! We will continue the experiments. if there is any new result, I'll update here. |
@WesleyLiu0717 we have executed several stress tests in the last few days and based on the test result we feel that the issue is with the way
Can you please execute your tests with JMeter HTTP provider or Websocket and let us know the test results? |
@amalrajmani
And there are some questions about your test
|
@WesleyLiu0717
Please note that my network set up is as follows: |
Hi, @amalrajmani
There are some observations of the experiments:
I'm not sure if the above observations are related to this issue. |
Hi @WesleyLiu0717,
Many thanks! |
Per @amalrajmani comment, if you are trying to get performance stats from Ethereum / Quorum, using a wrapper implementation such as web3js is going to be very slow -- it is simply not optimized for performance but for usability. If you are looking to get metrics of what a node / chain can do you should hit the JRPC end points directly |
Hi @amalrajmani @fixanoid, For the slowness in sync, it looks faster after replacing web3 with JRPC.
|
Hi @WesleyLiu0717 - To support private transactions, where in the private state can be different across different nodes, Quorum does not use |
Hi, I'm not sure why the experiments are always stuck after we continuously send transactions for a period of time. Although we upgrade the hardware, the chain finally get stuck. The following graph is the cpu usage of the experiments We grep 'err', 'raft', 'Imported' messages from the raft logs of these experiments and put in the following link: (The file is too big to put on the github. 1GB) We want to use raft in our product. Is there any suggestion for the setting? |
Hi @WesleyLiu0717, I have looked at the logs and notice the following errors in the log:
Also in all the runs, I see large number of leader re-elections. Can you please share the the Thanks |
Hi @vsmk98,
It costs 262059 gas in each test transactions, so there are at most 700000000/262059=2671.xx in one block. If more than 2671 transactions are packed in a block by minter, "gas limit reached" error occurs. I'm not sure the transaction, which is not packed in the block if this error occurs, is thrown away or still put in the transaction pool. If it is thrown away, I think this is why "nonce too high" error occurring because the nonce of our transactions are maintained in local. In my observations, leader re-election can soon finish at the beginning of the experiment. But after a period of time, it can't find the leader. We are confused that why 300 input rate experiment can successfully run one hour, but it fail after three hours. We can't find a input rate such that the system can work continuously for a long time. |
Hi @WesleyLiu0717 - thanks for sharing the information. Its bit surprising for me as well as we had run with http calls for 4.5 hours and could see a TPS of 1000+ as mentioned by @amalrajmani earlier. Will it be possible for you to join Quorum slack channel - http://bit.ly/quorum-slack to discuss this further? Please let me know. |
Hi @vsmk98, Thanks! |
Closing. Please reopen if more info is needed. |
Hi there,
We are doing stress test for quorum-raft.
We send 1000 contract function transactions per second to 4 nodes last for 16 hours.
(250Tx/s to each node)
In the beginning, all the transactions are confirmed.
But after while, one of the raft nodes broadcasts leader change msg, all the nodes fall into "leader change loop" and can't mine the block.
System information
Quorum version: v2.4.0
OS & Version: Ubuntu 18.04
Expected behaviour
Leader change mechanism can exactly decide a leader.
Actual behaviour
The election organizer(node4) can't receive the MsgVoteResp from some other nodes.
The following is part of the log, the raft log and block information are in the repository:
https://github.com/WesleyLiu0717/quorum-raft-experiment
Steps to reproduce the behaviour
1.Deploy the following smart contract.
2.Use 1000 different address to send 250 "create" contract function transaction to each node in one second last for more than 2 hours.
Note:
We have tried blockTime 50ms/250ms, and the same problem occurs.
Smart Contract
The text was updated successfully, but these errors were encountered: