Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobprofiler: WriteChunkedFileToJobInfo can write corrupt files on rewrite with the same filename #113232

Closed
adityamaru opened this issue Oct 27, 2023 · 1 comment · Fixed by #113241
Assignees
Labels
A-disaster-recovery A-jobs branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-jobs

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Oct 27, 2023

WriteChunkedFileToJobInfo takes in a byte slice and a filename, chunks it up into 1mb chunks and then "writes" it to the job_info table. A write to the job_info table however is a delete of a previous row with the same info_key and a new row with the new bytes for that key. If a caller calls WriteChunkedFileToJobInfo > 1 and the chunking of the file changes because of a change in size of the byte slice then we may end up in a situation where some chunks correspond to the old file and some to the new. We should either add a unique ts to the chunk names, but this then means an overwrite doesn't get rid of the previous version. Alternatively, we should teach the chunk method to delete all previous rows with the same filename before attempting to write new chunks.

Jira issue: CRDB-32821

@adityamaru adityamaru added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery T-jobs branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 labels Oct 27, 2023
@blathers-crl
Copy link

blathers-crl bot commented Oct 27, 2023

cc @cockroachdb/disaster-recovery

@adityamaru adityamaru self-assigned this Oct 27, 2023
craig bot pushed a commit that referenced this issue Oct 30, 2023
113241: jobs: fix bug in WriteChunkedFileToJobInfo during overwriting r=stevendanna a=adityamaru

Previously, `WriteChunkedFileToJobInfo` would chunk up the passed in byte slice and write the chunks to the job_info table with info keys constructed using the filename. If the method were to be invoked again with the same filename, due to the delete before write semantics of the job info table, if the number of chunks changed then we'd end up with a corrupt file. With chunks from the first and second write mixed.

This change fixes the bug by first deleting all the chunks that correspond to the filename before writing the new data. This is in line with how you'd expect an overwrite operation to work. This change also adds a regression test for the same.

Fixes: #113232
Release note (bug fix): fixes a bug in a method that was used by some of the jobs observability infrastructure, that could be triggered if a file was overwrriten with a different chunking strategy

Co-authored-by: adityamaru <[email protected]>
@craig craig bot closed this as completed in 371e7c5 Oct 30, 2023
blathers-crl bot pushed a commit that referenced this issue Oct 30, 2023
Previously, `WriteChunkedFileToJobInfo` would chunk up the
passed in byte slice and write the chunks to the job_info table
with info keys constructed using the filename. If the method
were to be invoked again with the same filename, due to the
delete before write semantics of the job info table, if the
number of chunks changed then we'd end up with a corrupt
file. With chunks from the first and second write mixed.

This change fixes the bug by first deleting all the chunks that
correspond to the filename before writing the new data. This
is in line with how you'd expect an overwrite operation to
work. This change also adds a regression test for the same.

Fixes: #113232
Release note (bug fix): fixes a bug in a method that was used
by some of the jobs observability infrastructure, that could be
triggered if a file was overwrriten with a different chunking
strategy
xinhaoz pushed a commit to xinhaoz/cockroach that referenced this issue Oct 30, 2023
Previously, `WriteChunkedFileToJobInfo` would chunk up the
passed in byte slice and write the chunks to the job_info table
with info keys constructed using the filename. If the method
were to be invoked again with the same filename, due to the
delete before write semantics of the job info table, if the
number of chunks changed then we'd end up with a corrupt
file. With chunks from the first and second write mixed.

This change fixes the bug by first deleting all the chunks that
correspond to the filename before writing the new data. This
is in line with how you'd expect an overwrite operation to
work. This change also adds a regression test for the same.

Fixes: cockroachdb#113232
Release note (bug fix): fixes a bug in a method that was used
by some of the jobs observability infrastructure, that could be
triggered if a file was overwrriten with a different chunking
strategy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery A-jobs branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-jobs
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant