-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wal: reader batch corruption #3865
Labels
A-storage
C-bug
Something isn't working
GA-blocker
P-0
Issues/test failures with a fix SLA of 2 weeks
T-storage
Comments
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Aug 19, 2024
While stitching together physical WAL segments, the WAL reader reads each record's contents into a temporary buffer. Previously, if an error was encountered mid-copy of a record (eg, because the record was large enough to be split across multiple chunks), the buffer was not reset before attempting to read the next file's record. This corruption could prevent replay of the WAL. This commit fixes the bug moving the buffer reset to immediately precede its use. Informs cockroachdb#3865.
exalate-issue-sync
bot
added
the
P-0
Issues/test failures with a fix SLA of 2 weeks
label
Aug 20, 2024
jbowens
added a commit
that referenced
this issue
Aug 20, 2024
While stitching together physical WAL segments, the WAL reader reads each record's contents into a temporary buffer. Previously, if an error was encountered mid-copy of a record (eg, because the record was large enough to be split across multiple chunks), the buffer was not reset before attempting to read the next file's record. This corruption could prevent replay of the WAL. This commit fixes the bug moving the buffer reset to immediately precede its use. Informs #3865.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Aug 20, 2024
While stitching together physical WAL segments, the WAL reader reads each record's contents into a temporary buffer. Previously, if an error was encountered mid-copy of a record (eg, because the record was large enough to be split across multiple chunks), the buffer was not reset before attempting to read the next file's record. This corruption could prevent replay of the WAL. This commit fixes the bug moving the buffer reset to immediately precede its use. Informs cockroachdb#3865.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Aug 20, 2024
While stitching together physical WAL segments, the WAL reader reads each record's contents into a temporary buffer. Previously, if an error was encountered mid-copy of a record (eg, because the record was large enough to be split across multiple chunks), the buffer was not reset before attempting to read the next file's record. This corruption could prevent replay of the WAL. This commit fixes the bug moving the buffer reset to immediately precede its use. Informs cockroachdb#3865.
What sequence of user actions led to this issue being uncovered? We should add a roach test operation for this scenario |
dshjoshi
moved this from Incoming
to In Progress (this milestone)
in [Deprecated] Storage
Aug 20, 2024
craig bot
pushed a commit
to cockroachdb/cockroach
that referenced
this issue
Aug 20, 2024
129331: go.mod: bump Pebble to 3d14906a0e0c r=itsbilal a=jbowens Changes: * [`3d14906a`](cockroachdb/pebble@3d14906a) wal: fix reader buffer reuse * [`c2749cc7`](cockroachdb/pebble@c2749cc7) colblk: use crlib/crbytes.CommonPrefix * [`f90b350e`](cockroachdb/pebble@f90b350e) docs/rfc: add RFC for point tombstone density compaction heuristic * [`1a5295f8`](cockroachdb/pebble@1a5295f8) block: move block handle encoding and decoding * [`0e8a11a5`](cockroachdb/pebble@0e8a11a5) metamorphic: no synthetic suffix with overlapping RangeKeySets Release note (bug fix): Fix bug in the public preview WAL failover feature that could prevent a node from starting if it crashed during a failover. Epic: none. Informs cockroachdb/pebble#3865. Co-authored-by: Jackson Owens <[email protected]>
jbowens
added a commit
that referenced
this issue
Aug 20, 2024
While stitching together physical WAL segments, the WAL reader reads each record's contents into a temporary buffer. Previously, if an error was encountered mid-copy of a record (eg, because the record was large enough to be split across multiple chunks), the buffer was not reset before attempting to read the next file's record. This corruption could prevent replay of the WAL. This commit fixes the bug moving the buffer reset to immediately precede its use. Informs #3865.
jbowens
added a commit
that referenced
this issue
Aug 20, 2024
While stitching together physical WAL segments, the WAL reader reads each record's contents into a temporary buffer. Previously, if an error was encountered mid-copy of a record (eg, because the record was large enough to be split across multiple chunks), the buffer was not reset before attempting to read the next file's record. This corruption could prevent replay of the WAL. This commit fixes the bug moving the buffer reset to immediately precede its use. Informs #3865.
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Aug 20, 2024
Changes: * [`02bb64a5`](cockroachdb/pebble@02bb64a5) wal: fix reader buffer reuse * [`72e9205d`](cockroachdb/pebble@72e9205d) db: export MakeTrailer * [`37311795`](cockroachdb/pebble@37311795) internal/keyspan: export Truncate in keyspan.Fragmenter. Release note (bug fix): Fix bug in the public preview WAL failover feature that could prevent a node from starting if it crashed during a failover. Epic: none. Epic: none. Informs cockroachdb/pebble#3865.
This was referenced Aug 20, 2024
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Aug 20, 2024
Changes: * [`211dce02`](cockroachdb/pebble@211dce02) wal: fix reader buffer reuse * [`cd809634`](cockroachdb/pebble@cd809634) db: export MakeTrailer * [`f318ad84`](cockroachdb/pebble@f318ad84) internal/keyspan: export Truncate in keyspan.Fragmenter. Release note (bug fix): Fix bug in the public preview WAL failover feature that could prevent a node from starting if it crashed during a failover. Epic: none. Epic: none. Epic: none. Informs cockroachdb/pebble#3865.
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Aug 27, 2024
Add a randomized test of WAL failover and recovery of a DB from failover WAL logs. Close cockroachdb#3865.
sumeerbhola
added a commit
to sumeerbhola/pebble
that referenced
this issue
Aug 28, 2024
…on open This also ensures we don't leak a file descriptor. Informs cockroachdb#3865
sumeerbhola
added a commit
to sumeerbhola/pebble
that referenced
this issue
Aug 28, 2024
…on open This also ensures we don't leak a file descriptor. Informs cockroachdb#3865
sumeerbhola
added a commit
to sumeerbhola/pebble
that referenced
this issue
Aug 28, 2024
…on open This also ensures we don't leak a file descriptor. Informs cockroachdb#3865
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Aug 28, 2024
Add a randomized test of WAL failover and recovery of a DB from failover WAL logs. Close cockroachdb#3865.
sumeerbhola
added a commit
that referenced
this issue
Aug 28, 2024
…on open This also ensures we don't leak a file descriptor. Informs #3865
jbowens
added a commit
to jbowens/pebble
that referenced
this issue
Aug 28, 2024
Add a randomized test of WAL failover and recovery of a DB from failover WAL logs. Close cockroachdb#3865.
jbowens
added a commit
that referenced
this issue
Aug 28, 2024
Add a randomized test of WAL failover and recovery of a DB from failover WAL logs. Close #3865.
github-project-automation
bot
moved this from In Progress (this milestone)
to Done
in [Deprecated] Storage
Aug 28, 2024
jbowens
pushed a commit
to jbowens/pebble
that referenced
this issue
Aug 28, 2024
…on open This also ensures we don't leak a file descriptor. Informs cockroachdb#3865
jbowens
pushed a commit
to jbowens/pebble
that referenced
this issue
Aug 28, 2024
…on open This also ensures we don't leak a file descriptor. Informs cockroachdb#3865
jbowens
pushed a commit
that referenced
this issue
Aug 28, 2024
…on open This also ensures we don't leak a file descriptor. Informs #3865
jbowens
pushed a commit
that referenced
this issue
Aug 28, 2024
…on open This also ensures we don't leak a file descriptor. Informs #3865
@jbowens Have we already added " add randomized test of WAL failover and recovery"? |
Yes |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-storage
C-bug
Something isn't working
GA-blocker
P-0
Issues/test failures with a fix SLA of 2 weeks
T-storage
While stitching together physical WAL segments, the WAL reader reads each record's contents into a temporary buffer. There's a bug where an error encountered mid-copy of a record (eg, because the record was large enough to be
split across multiple chunks), the buffer was not reset before attempting to read the next file's record. This corruption could prevent replay of the WAL.
Jira issue: PEBBLE-244
The text was updated successfully, but these errors were encountered: