Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEO3 testnet stoppped after running for about six hours. #1084

Closed
superboyiii opened this issue Sep 3, 2019 · 18 comments · Fixed by #1083
Closed

NEO3 testnet stoppped after running for about six hours. #1084

superboyiii opened this issue Sep 3, 2019 · 18 comments · Fixed by #1083
Labels
Bug Used to tag confirmed bugs

Comments

@superboyiii
Copy link
Member

superboyiii commented Sep 3, 2019

Describe the bug
Running NEO3 testnet for about 6 hours, CN3(index=3) node hadn't open wallet so that every 7 blocks when CN3 is primary, recovery will be sent and then CN2(index=2) will be the Primary to generate block and go on. However, there's a time when Primary is turn to CN2 from CN3, because of something unknown, CN2 is time out then view changed to CN3 again. After that, there're always recovery message received and view hasn't changed anymore. The next day, when we restart all consensus nodes, it returns null Payloads on neo\neo\Consensus\ConsensusService.cs:line 149

To Reproduce
Steps to reproduce the behavior:

  1. Open a privatenet with 7 nodes.
  2. Make CN3 close wallet as watch-only.
  3. Keep running and make CN2 timeout.

Expected behavior
Private net was broken...

Screenshots
image

Platform:

The issue block is on 1326

@cloud8little
Copy link
Contributor

@nicolegys
Copy link
Contributor

@vncoelho
Copy link
Member

vncoelho commented Sep 3, 2019

You guys are testing everything. hauaahah

Congratulations.

@erikzhang
Copy link
Member

I think there are two issues.

The first is that the consensus stops. From the log point of view, the consensus node seems to be split into two groups, 0, 4, 5 and 1, 2, 6. It looks a bit like a fork.

The second is that restarting the consensus node will cause a NullReferenceException on recovery. This has been fixed in #1083.

@neo-project/core Please check the first issue.

@erikzhang erikzhang added the Bug Used to tag confirmed bugs label Sep 3, 2019
@vncoelho
Copy link
Member

vncoelho commented Sep 4, 2019

@erikzhang, an interesting thing was that one of the recovery payloads had a changeview that was not valid, because there was a print saying 3/4.

I am having some difficulties for opening the logs. I just saw briefly on @superboyiii 's computer

@vncoelho
Copy link
Member

vncoelho commented Sep 4, 2019

@superboyiii @cloud8little @erikzhang,
First of all, thanks for you help and attention for providing the logs.

Perhaps one of the possible errors can be seen here:

[Info][17:47:20.753] send commit (C2)
[Info][17:47:20.787] OnCommitReceived: height=1326 view=2 index=2 nc=1 nf=1 (C3)
[Info][17:47:20.797] OnCommitReceived: height=1326 view=2 index=0 nc=2 nf=1 (C1)
[Info][17:47:20.825] OnCommitReceived: height=1326 view=2 index=6 nc=2 nf=1 (C7)
[Info][17:47:20.837] OnCommitReceived: height=1326 view=2 index=5 nc=3 nf=1 (C6)
[Info][17:47:20.925] OnCommitReceived: height=1326 view=2 index=0 nc=3 nf=1
[Info][17:47:21.040] OnCommitReceived: height=1326 view=2 index=5 nc=3 nf=1
[Info][17:47:21.041] OnPrepareResponseReceived: height=1326 view=2 index=4
[Info][17:47:21.092] OnCommitReceived: height=1326 view=2 index=4 nc=3 nf=1 (C5)

Take a look at this, we already have 6 commits! aheuahuea
Block should have been produced.

For some reason the checkcommits() is not being triggered or not passing.

if (hashData == null)
{
existingCommitPayload = payload;
}
else if (Crypto.Default.VerifySignature(hashData, commit.Signature,
context.Validators[payload.ValidatorIndex].EncodePoint(false)))
{
existingCommitPayload = payload;
CheckCommits();
}

There is something really strange going on, the logs also keep processing the commit payloads...

We should improve the logs.

@vncoelho
Copy link
Member

vncoelho commented Sep 4, 2019

Since the OnCommitReceived message is still hapenning, it makes sense that the following situation is not !=null:

            ref ConsensusPayload existingCommitPayload = ref context.CommitPayloads[payload.ValidatorIndex];
            if (existingCommitPayload != null)
            {
                if (existingCommitPayload.Hash != payload.Hash)
                    Log($"{nameof(OnCommitReceived)}: different commit from validator! height={payload.BlockIndex} index={payload.ValidatorIndex} view={commit.ViewNumber} existingView={existingCommitPayload.ConsensusMessage.ViewNumber}", LogLevel.Warning);
                return;
            }

For some reason the payloads are not being stored.

@shargon
Copy link
Member

shargon commented Sep 10, 2019

Something weird, in CN3's logs

image

But if is watchOnly, this couldn't happen...

image

@shargon
Copy link
Member

shargon commented Sep 10, 2019

Also i can't find any height=1326 view=0 index=1 role=Primary :S

@cloud8little
Copy link
Contributor

cloud8little commented Sep 10, 2019

@shargon Owen means index=3, which actually is this node 2019-09-03(c4).log.

@shargon
Copy link
Member

shargon commented Sep 10, 2019

How many seconds between blocks?

@cloud8little
Copy link
Contributor

@shargon 15secs

@superboyiii
Copy link
Member Author

superboyiii commented Sep 10, 2019

@shargon This is already fixed in #1083. We're making more test on this PR to ensure no addtional issues appear.

@shargon
Copy link
Member

shargon commented Sep 10, 2019

@superboyiii both issues?

@superboyiii
Copy link
Member Author

@superboyiii both issues?

Yes, both.

@shargon
Copy link
Member

shargon commented Sep 10, 2019

And do you know why this happens?

#1084 (comment)

@lock9
Copy link
Contributor

lock9 commented Sep 10, 2019

@superboyiii how can we ensure this has been solved? Are there steps we can follow to reproduce this error? Can it be reproduced using unit tests?

@erikzhang
Copy link
Member

It has been fixed, I think. The cause of this bug has been found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Used to tag confirmed bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants