Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: return a corruption error when chunk head out of sequence #7855

Merged

Conversation

lentil1016
Copy link
Contributor

@lentil1016 lentil1016 commented Aug 26, 2020

No description provided.

@lentil1016 lentil1016 changed the title fix: return a corruption error when iterator function find a chunk th… [WIP] fix: return a corruption error Aug 26, 2020
@lentil1016 lentil1016 force-pushed the fix-head-out-of-order-deletion branch 3 times, most recently from 571cf2a to e4eb611 Compare August 26, 2020 12:16
@lentil1016 lentil1016 changed the title [WIP] fix: return a corruption error [WIP] fix: return a corruption error when out of sequence Aug 26, 2020
@lentil1016 lentil1016 changed the title [WIP] fix: return a corruption error when out of sequence [WIP] fix: return a corruption error when chunk head out of sequence Aug 26, 2020
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Can you brief about what you are trying to do exactly? Because what you mentioned in the title of the PR would not fix the the linked issues. (and out of sequence files is not a data corruption)

@lentil1016 lentil1016 force-pushed the fix-head-out-of-order-deletion branch from e4eb611 to 8747da7 Compare August 26, 2020 13:20
@codesome
Copy link
Member

Also, this PR gave me a hint why #7753 was happening, thanks! When we error out in IterateAllChunks, we are not setting cdm.fileMaxtSet, while we should for every case. Do you think you can do that in this PR?

@codesome
Copy link
Member

I have updated the PR description to exclude #5617 as that is unrelated to m-mapping

@lentil1016
Copy link
Contributor Author

lentil1016 commented Aug 26, 2020

@codesome Hi, yesterday I meet some problem. The log tolds me that it meet some expection on loading head chunk with ref 6778345.

level=info ts=2020-08-24T02:39:28.116Z caller=head.go:641 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=error ts=2020-08-24T02:39:32.539Z caller=head.go:646 component=tsdb msg="Loading on-disk chunks failed" err="iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 6778345"
level=info ts=2020-08-24T02:39:32.539Z caller=head.go:757 component=tsdb msg="Deleting mmapped chunk files"
level=info ts=2020-08-24T02:39:32.539Z caller=head.go:760 component=tsdb msg="Deletion of mmap chunk files failed, discarding chunk files completely" err="cannot handle error: iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 6778345"
level=info ts=2020-08-24T02:39:32.539Z caller=head.go:655 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=4.423135929s

It seems that this chunk have an unexpected mint so it cause a return at here with an normal error, then at here, This error is a "cannot handle error", because it's not a CorruptionErr.

Every compact trying after then will failed. Even restart won't help, because this chunk is "cannot handle" and won't be removed. Data will become unvisible 2h by 2h.

level=error ts=2020-08-25T09:00:32.352Z caller=db.go:685 component=tsdb msg="compaction failed" err="reload blocks: head truncate failed: truncate chunks.HeadReadWriter: maxt of the files are not set"
level=info ts=2020-08-25T09:06:14.397Z caller=compact.go:495 component=tsdb msg="write block" mint=1598277600000 maxt=1598284800000 ulid=01EGJDKQT3TGPE2K6NVXWEV93P duration=4m42.041737936s
level=error ts=2020-08-25T09:06:23.632Z caller=db.go:685 component=tsdb msg="compaction failed" err="reload blocks: head truncate failed: truncate chunks.HeadReadWriter: maxt of the files are not set"
level=info ts=2020-08-25T09:11:44.949Z caller=compact.go:495 component=tsdb msg="write block" mint=1598284800000 maxt=1598292000000 ulid=01EGJDYEVX1EMW8M2BG6HJWA9G duration=4m21.303919073s
level=error ts=2020-08-25T09:11:52.854Z caller=db.go:685 component=tsdb msg="compaction failed" err="reload blocks: head truncate failed: truncate chunks.HeadReadWriter: maxt of the files are not set"
level=info ts=2020-08-25T09:17:17.827Z caller=compact.go:495 component=tsdb msg="write block" mint=1598292000000 maxt=1598299200000 ulid=01EGJE8GBPGND1YX2BX28G4WZK duration=4m24.973065568s
level=error ts=2020-08-25T09:17:26.128Z caller=db.go:685 component=tsdb msg="compaction failed" err="reload blocks: head truncate failed: truncate chunks.HeadReadWriter: maxt of the files are not set"
level=info ts=2020-08-25T09:22:29.737Z caller=compact.go:495 component=tsdb msg="write block" mint=1598299200000 maxt=1598306400000 ulid=01EGJEJNTH4718GQQB8A2TQEPB duration=4m3.6077139s
level=error ts=2020-08-25T09:22:37.816Z caller=db.go:685 component=tsdb msg="compaction failed" err="reload blocks: head truncate failed: truncate chunks.HeadReadWriter: maxt of the files are not set"

I'm not sure, but I believe this chunk is corrupted. And I think an iterator function should be able to return an corruptionErr to let the caller know that this chunk have a bad metadata and should be removed.

@codesome
Copy link
Member

out of sequence m-mapped chunk for series ref 6778345

Hm, true, its better to delete m-mapped chunks from that point.

we are not setting cdm.fileMaxtSet, while we should for every case.

We should also take care of this

Dir: cdm.dir.Name(),
FileIndex: segID,
}
if err := f(seriesRef, chunkRef, mint, maxt, numSamples, chunkErrTpl); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be just wrapping this error with corruption error and return and nothing else. You don't need additional argument for f.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops... sorry, my brainfart :p

@codesome
Copy link
Member

cc @roidelapluie, it would be nice to get this into 2.21

@lentil1016 lentil1016 force-pushed the fix-head-out-of-order-deletion branch from 8747da7 to 207661f Compare August 26, 2020 14:04
@lentil1016 lentil1016 changed the title [WIP] fix: return a corruption error when chunk head out of sequence fix: return a corruption error when chunk head out of sequence Aug 26, 2020
Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

tsdb/chunks/head_chunks.go Outdated Show resolved Hide resolved
@lentil1016 lentil1016 force-pushed the fix-head-out-of-order-deletion branch from 207661f to 95afbd3 Compare August 26, 2020 14:53
@codesome codesome merged commit cfd4e05 into prometheus:master Aug 26, 2020
@lentil1016 lentil1016 deleted the fix-head-out-of-order-deletion branch August 27, 2020 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants