-
Notifications
You must be signed in to change notification settings - Fork 59
Conversation
How did it show? Do you have any manual tests or unit tests to verify this modification? |
Current restore process:
If downloading got ERR_CORRUPT, meta should not retry restore, because the checkpoints on remote provider is corrupted. In current implementation, meta doesn't handle this situation, which will lead that restore process repeated forever and could not stop. |
Yes, I have done some manual tests to see whether the restore progress can stop or not. |
In restore,
_restore_status = ERR_CORRUPTION
means that it encounters error, and the restore progress should be stopped.And when file.md5sum is not equal with the corresponding md5sum in metadata, it means something is wrong with this file. So we should stop the restore progress by setting
_restore_status = ERR_CORRUPTION
. Or the restore progress will repeat forever, as our system shows now.Manual Test
Action
Restore with corrupt checkpoint files.
Note that
result
is the new app which is created by restore progress.Before Modification
There are a lot of errors in log file for each partition, which means it retried many times:
And we can see the
result
app will exist forever.After Modification
The error log occurs only once for each partition:
And the
result
app was deleted immediately when the error occurs.