-
Notifications
You must be signed in to change notification settings - Fork 20.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gap in chain (investigation) #26483
Comments
Ah, I get it.
|
|
I wonder what the filesystem is on |
BTW,
It's ambiguous to me. It feels like we miss 90k data, but actually not. Although it's very hard to know what's the oldest one in kv store. |
It's interesting. Usually we truncate the higher one by deleting items one by one. Honestly it's pretty weird, looks like data file misses some bodies and a partial body. |
Main issue where people have reported these cases: #22374 There, I asked a while ago whether |
This tells us the following:
This is also safely "within" a file, not near a crossover.
And then:
So, for the |
Ah, we probably do, but we don't think it's worth logging: log := t.logger.Debug
if existing > items+1 {
log = t.logger.Warn // Only loud warn if we delete multiple items
}
log("Truncating freezer table", "items", existing, "limit", items) I think we should change that - deleting anything is definitely worth logging. Ah, actually, we can't change that -- see #21483
|
I downloaded the index files from a production node, to check the data layouts / sizes.
The data lost in receipts, is somewhere between
So around For
Around |
@nisdas this is our thread investigating your issue, could you do a disk check on both the internal and external drive? |
I could only run smart checks on my internal drives, but here it is:
|
Hold up -- does that mean not the drive which the ancients are on? At this point, we're only really interested in the disk holding the ancients, the |
For my external drive, doing extensive disk checks will be difficult as it would require me to restart my machine(my node is currently running with validators on them). I can't run the smart tool checks as it requires me to disable a module in the kernel: https://www.smartmontools.org/ticket/971 . If it helps, I just bought the external drive 2 months back. |
Similar error occurred on a
|
The 'block receipts' missing is curious, because during freeze, it reads the fields in this order:
hash := ReadCanonicalHash(nfdb, number)
if hash == (common.Hash{}) {
return fmt.Errorf("canonical hash missing, can't freeze block %d", number)
}
header := ReadHeaderRLP(nfdb, hash, number)
if len(header) == 0 {
return fmt.Errorf("block header missing, can't freeze block %d", number)
}
body := ReadBodyRLP(nfdb, hash, number)
if len(body) == 0 {
return fmt.Errorf("block body missing, can't freeze block %d", number)
}
receipts := ReadReceiptsRLP(nfdb, hash, number)
if len(receipts) == 0 {
return fmt.Errorf("block receipts missing, can't freeze block %d", number)
}
td := ReadTdRLP(nfdb, hash, number)
if len(td) == 0 {
return fmt.Errorf("total difficulty missing, can't freeze block %d", number)
} So that means that the other fields are present in leveldb, but only the receipts are missing (and possibly td). Checking those, it seems that the
|
Also, the child block, So it seems that for |
So, on a functioning node, I did
I took the output, and put it into a bash variable, and then did
THat seems to have fixed it. I'm curious to see if it is indeed fixed, or if something else will turn up. |
This ticket is stale, closing |
Shutdown
16,380,262
,16,380,261
and16,380,135
16,380,262
is the most recent state.Startup:
Truncating freezer table
only trimmed away 5 items:items=16,289,887 limit=16,289,882
16,289,882
, as expected after the trim.16380262
, the difference here is90380
.16380262
-- that is the HEAD. Why arewe expecting the HEAD to link up with the ancients? Answer: we're not, normally, that's just a special-case when we do the insert-directly-to-ancients during sync. In normal case, if we wound up here, there's already a confirmed gap.
Relevant code
The text was updated successfully, but these errors were encountered: