Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swap-move: Power failure during the writing of the magic could brick the device #1966

Open
taltenbach opened this issue May 27, 2024 · 1 comment · May be fixed by #2100
Open

swap-move: Power failure during the writing of the magic could brick the device #1966

taltenbach opened this issue May 27, 2024 · 1 comment · May be fixed by #2100

Comments

@taltenbach
Copy link
Contributor

taltenbach commented May 27, 2024

Description

While reading through the implementation of the swap using move, I noticed a corner case that could, if I'm not mistaken, brick the device in case a power failure occurs during a revert process.

Let's suppose after an upgrade you have a non-functional image in the primary slot. The image won't be confirmed, leading to a revert at next boot. At the beginning of the revert process, fixup_revert is invoked, which rewrites the trailer in the secondary slot so that the revert looks like a permanent upgrade. Normally, after the execution of this routine, the secondary slot has a valid trailer, in particular with a valid magic number.

Let's imagine a power failure occurs during the writing of the trailer's magic, i.e. in boot_write_magic (see here). The magic number in the secondary slot is in an undefined state and might be partially written, which implies at next boot it will be considered in BOOT_MAGIC_BAD state.

So, at next boot, we have the following state:

  • Primary slot: magic=good, copy-done=set, image-ok=unset
  • Secondary slot: magic=bad, copy-done=unset, image-ok=set

This doesn't match any state leading to an upgrade or revert process to be initiated, so boot_swap_type_multi will return BOOT_SWAP_TYPE_NONE. This means MCUboot will not perform the revert and attempt to boot from the primary slot, containing a non-functional image. Hence, the device is bricked unless it is possible to reflash the secondary slot without a functional image.

Note at the time fixup_revert is called, the boot status is virgin, meaning boot_status_is_reset(bs) is true at next boot so MCUboot won't detect a revert process was interrupted and won't try to resume it.

How to reproduce

To easily reproduce the issue, it is possible to modify the boot_write_magic routine to write the magic in two calls to flash_area_write. For instance, it is possible to replace:

rc = flash_area_write(fap, pad_off, &magic[0], BOOT_MAGIC_ALIGN_SIZE);

With:

rc = flash_area_write(fap, pad_off, &magic[0], BOOT_MAGIC_ALIGN_SIZE / 2);
if (rc != 0) {
    return BOOT_EFLASH;
}

rc = flash_area_write(fap, pad_off + BOOT_MAGIC_ALIGN_SIZE / 2, &magic[BOOT_MAGIC_ALIGN_SIZE / 2],
                      BOOT_MAGIC_ALIGN_SIZE - BOOT_MAGIC_ALIGN_SIZE / 2);

And to to run the tests for swap-move on the simulator:

cargo test --features "swap-move"

During the revert_with_fails test, a power failure is simulated during every modification to the flash memory, so in particular between the two flash_area_write in boot_write_magic. This will lead to an error:

[ERROR bootsim::image] Revert failed at interruption 5
[...]
[ERROR bootsim::image] Revert failed at interruption 236
test revert_with_fails ... FAILED
[...]
---- revert_with_fails stdout ----
thread 'revert_with_fails' panicked at sim/tests/core.rs:57:1:
assertion failed: !image.run_revert_with_fails()

I didn't try on real hardware, but I'm happy to do so if you consider that useful.

@taltenbach taltenbach changed the title swap-move: Power failure during magic writing could brick the device swap-move: Power failure during the writing of the magic could brick the device May 27, 2024
@taltenbach
Copy link
Contributor Author

taltenbach commented Jul 1, 2024

I had some time to look a bit further into this issue. A very simple fix would be to modify the boot_swap_tables in bootutil_public.c (see here) so that a revert is performed whatever the state of the magic number in the secondary slot. This means we would have:

    {
        .magic_primary_slot =       BOOT_MAGIC_GOOD,
        .magic_secondary_slot =     BOOT_MAGIC_ANY,         /* Was: BOOT_MAGIC_UNSET */
        .image_ok_primary_slot =    BOOT_FLAG_UNSET,
        .image_ok_secondary_slot =  BOOT_FLAG_ANY,
        .copy_done_primary_slot =   BOOT_FLAG_SET,
        .swap_type =                BOOT_SWAP_TYPE_REVERT,
    },

Intuitively, this sounds good to me: a revert is performed if and only if the copy-done flag is set in the primary slot but the image-ok flag is not. I also tested this change on the simulator and all tests passed.

@utzig @d3zd3z Are you aware of any reason why the BOOT_MAGIC_UNSET state is currently required for the secondary slot's trailer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

1 participant