Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fix batch chunk runtime error #1355

Merged
merged 8 commits into from
Jan 15, 2024

Conversation

hangvane
Copy link
Contributor

@hangvane hangvane commented Jul 7, 2023

Relevant Issue (if applicable)

If there are Issues related to this PullRequest, please list it.

Details

  • Fixed batch chunk runtime error that will cause core dump when running containers built with --batch-size enabled.
  • Improve runtime performance of batch chunk.

Types of changes

What types of changes does your PullRequest introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation Update (if none of the other choices apply)

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@hangvane hangvane requested a review from a team as a code owner July 7, 2023 08:05
@hangvane hangvane requested review from liubin, luodw and adamqqqplay and removed request for a team July 7, 2023 08:05
@anolis-bot
Copy link
Collaborator

@hangvane , a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/83345

@codecov
Copy link

codecov bot commented Jul 7, 2023

Codecov Report

Attention: 257 lines in your changes are missing coverage. Please review.

Comparison is base (596492b) 61.29% compared to head (c91f9fc) 61.39%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1355      +/-   ##
==========================================
+ Coverage   61.29%   61.39%   +0.09%     
==========================================
  Files         144      144              
  Lines       46597    46962     +365     
  Branches    44133    44498     +365     
==========================================
+ Hits        28562    28832     +270     
- Misses      16589    16646      +57     
- Partials     1446     1484      +38     
Files Coverage Δ
storage/src/meta/batch.rs 97.60% <100.00%> (-0.11%) ⬇️
storage/src/test.rs 51.92% <100.00%> (+2.94%) ⬆️
src/bin/nydus-image/main.rs 0.71% <0.00%> (ø)
builder/src/core/blob.rs 41.33% <0.00%> (ø)
builder/src/compact.rs 80.32% <0.00%> (-0.25%) ⬇️
rafs/src/metadata/cached_v5.rs 80.76% <0.00%> (-0.30%) ⬇️
rafs/src/metadata/direct_v5.rs 56.31% <0.00%> (-0.31%) ⬇️
rafs/src/metadata/layout/v5.rs 85.23% <0.00%> (-0.21%) ⬇️
rafs/src/metadata/md_v5.rs 28.94% <0.00%> (-0.47%) ⬇️
rafs/src/mock/mock_chunk.rs 90.66% <0.00%> (-3.78%) ⬇️
... and 10 more

... and 3 files with indirect coverage changes

@anolis-bot
Copy link
Collaborator

@hangvane , The CI test is completed, please check result:

Test CaseTest Result
nydus_ci❌ FAIL

Sorry, your test job failed. Please get the details in the link.

@Desiki-high
Copy link
Member

/retest

@anolis-bot
Copy link
Collaborator

@Desiki-high , the test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/83380

@anolis-bot
Copy link
Collaborator

@Desiki-high , The CI test is completed, please check result:

Test CaseTest Result
build rust golang image✅ SUCCESS
compile nydusd✅ SUCCESS
compile ctr remote✅ SUCCESS
compile nydus snapshotter✅ SUCCESS
run container with rafs✅ SUCCESS
run container with zran✅ SUCCESS
run container with rafs and compile linux❌ FAIL

Sorry, your test job failed. Please get the details in the link.

@hangvane
Copy link
Contributor Author

hangvane commented Jul 8, 2023

/retest

@anolis-bot
Copy link
Collaborator

@hangvane , the test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/83444

@anolis-bot
Copy link
Collaborator

@hangvane , The CI test is completed, please check result:

Test CaseTest Result
build rust golang image✅ SUCCESS
compile nydusd✅ SUCCESS
compile ctr remote✅ SUCCESS
compile nydus snapshotter✅ SUCCESS
run container with rafs✅ SUCCESS
run container with zran✅ SUCCESS
run container with rafs and compile linux❌ FAIL

Sorry, your test job failed. Please get the details in the link.

@hangvane
Copy link
Contributor Author

hangvane commented Jul 8, 2023

/retest

@anolis-bot
Copy link
Collaborator

@hangvane , the test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/83446

@hangvane hangvane force-pushed the nydusd branch 4 times, most recently from 0b8c1de to 62c345e Compare October 24, 2023 02:45
@imeoer
Copy link
Collaborator

imeoer commented Oct 25, 2023

@jiangliu Please take a look this again, thanks!

@hangvane
Copy link
Contributor Author

Great Job! Please add e2e test for build/run with batch, which reproduces bugfix in master, but resolved in your pr.

E2E tests have been added to master,

Dimension(paramEnablePrefetch, []interface{}{false, true}).
Dimension(paramBatch, []interface{}{"0", "0x100000"}).
Dimension(paramEncrypt, []interface{}{false, true}).

and may randomly fail E2E tests. But the logs are expired and cannot be traced.
https://github.com/dragonflyoss/nydus/actions/runs/5452176504

@Desiki-high
Copy link
Member

@ccx1024cc Please take a look at this again, thanks!

…fo for batch chunks.

1. `compressed_offset` is used for build-time and runtime sorting for chunk info.
So we move `compressed_offset` from `BatchInflateContext` to chunk info for batch chunks.

2. the `compressed_size` for the blobs in batch mode is not correctly set.
We thus fix it by setting the value of `dumped_size`.

Signed-off-by: Wenhao Ren <[email protected]>
Currently, many error are output as `os error 22` lossing customized log info.
So we change the Error type for correctly output and log the error info
as what we expected.

Signed-off-by: Wenhao Ren <[email protected]>
Read amplification for batch chunk is not correctly implemented that may crash.
The read amplification is rewrited to fix this bug.
A unit test for read amplification is also added for covering this code.

Signed-off-by: Wenhao Ren <[email protected]>
`BlobCompressionContextInfo` is need to read batch chunk info.
`BlobCCI` is introduced for simplifying the code,
and decrease the times of getting this context by lazy loading.

Signed-off-by: Wenhao Ren <[email protected]>
1. Add the validation for batch chunks.
2. Add unit test for `BatchInflateContext`.

Signed-off-by: Wenhao Ren <[email protected]>
By passing the chunk continuous check, and correctly sort batch chunks,
the prefetch request will not be interrupted by batch chunks anymore.

Signed-off-by: Wenhao Ren <[email protected]>
@Desiki-high Desiki-high merged commit 9dae4ec into dragonflyoss:master Jan 15, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants