Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootutil: Fix device bricked after power failure during swap-move revert #2100

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

taltenbach
Copy link
Contributor

@taltenbach taltenbach commented Oct 16, 2024

This PR proposes a fix to #1966, which describes a scenario where a device can be bricked if a revert process is interrupted when using swap-move.

As suggested in this message, a very straightforward fix might be enough. The latter is implemented in this PR.

The idea is to perform a revert no matter the state of the magic number in the secondary slot's trailer, provided the copy-done
flag is set in the primary slot but the image-ok flag is not. The copy-done flag is set only after having completed an upgrade or
revert process so if the copy-done flag is set but the image-ok is unset, it is guaranteed an upgrade has been performed but the new image has not been confirmed, which implies a revert is needed.

That looks good to me but perhaps I missed some corner cases that would justify that BOOT_MAGIC_UNSET was used instead of BOOT_MAGIC_ANY. @utzig @d3zd3z do you have any input on that point?

Fixes #1966

Let's suppose after an upgrade you have a non-functional image in the
primary slot. The image won't be confirmed, leading to a revert at next
boot. At the beginning of the revert process, fixup_revert is invoked,
which rewrites the trailer in the secondary slot so that the revert
looks like a permanent upgrade. Normally, after the execution of this
routine, the secondary slot has a valid trailer, in particular with a
valid magic number.

Let's imagine a power failure occurs during the writing of the trailer's
magic, i.e. in boot_write_magic. The magic number in the secondary slot
is in an undefined state and might be partially written, which implies
at next boot it will be considered in BOOT_MAGIC_BAD state.

So, at next boot, we have the following state:
Primary slot: magic=good, copy-done=set, image-ok=unset
Secondary slot: magic=bad, copy-done=unset, image-ok=set

This doesn't match any state leading to an upgrade or revert process to
be initiated, which means MCUboot will not perform the revert and
attempt to boot from the primary slot, containing a non-functional
image. Hence, the device is bricked unless it is possible to reflash the
secondary slot without a functional image.

To avoid this issue, a revert is performed no matter the state of the
magic number in the secondary slot's trailer, provided the copy-done
flag is set in the primary slot but the image-ok flag is not. The
copy-done flag is set only after having completed an upgrade or
revert process so if the copy-done flag is set but the image-ok is
unset, it is guaranteed an upgrade has been performed but the new image
has not been confirmed, which implies a revert is needed.

Signed-off-by: Thomas Altenbach <[email protected]>
@taltenbach taltenbach changed the title bootutil: Fix device brick after power failure during swap-move revert bootutil: Fix device bricked after power failure during swap-move revert Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

swap-move: Power failure during the writing of the magic could brick the device
1 participant