Add support for non-recursive single-toolchain multi-image builds #13672

SebastianBoe · 2019-02-22T13:05:44Z

We are seeing the need to boot Zephyr applications in multiple
stages, a real-world use-case of even a 3-stage bootloader has
been identified and tested.

Each bootloader needs to be individually upgrade-able and have
it's own configurations of Zephyr libraries. To achieve this each
bootloader is organized as it's own executable.

The problem is that the build system can only build one
executable. This creates a usability issue as the user must
invoke a large set of arcane commands to build, sign, and flash
each bootloader.

It also causes a problem for the configuration system. If the build and configuration
system can only build one image it will not be able to automatically share configuration
between images or enforce that one image is configured to be compatible with another
image. The user must instead manually keep each image's configuration in sync.

To resolve this we re-organize the build system such that it can build
multiple executables.

For the user, building both a bootloader and an application now looks
like:

# Build and flash both MCUboot and the signed child image hello_world:
west update
cmake -H$ZEPHYR_BASE/samples/hello_world -DCONFIG_BOOTLOADER_MCUBOOT=y -Bbuild
ninja -C build flash

In contrast to the 8 or so commands that was needed previously, or the
five commands that are needed with west:
#12903 (comment)

Internally, the build system uses two mechanisms to prevent the
images from conflicting with each other. Firstly, each image is
executed in it's own 'subdirectory', created with
'add_subdirectory'. This ensures that variables set in a
subdirectory (image 3) does not corrupt the variables set in the
parent directory (image 2).

e.g. in the code snippet below:

set(FOO 0)

add_subdirectory(child_image child_image_build_dir)

print(FOO)

Due to CMake scoping rules, the variable FOO will be '0' after
'add_subdirectory' has been processed, even if the subdirectory
temporarily set FOO to something else.

When scoping rules can not be used to prevent aliasing issues between
the images, e.g. for global symbols like target names, global
properties, and CACHE variables, a disambiguating prefix is used in
the symbol name.

Meaning, instead of using the library name 'kernel', we would for an
MCUBoot-enabled application have two libraries, 'kernel' and
'mcuboot_kernel'. When writing build scripts that refers to the two
kernel libraries names, one would use;

kernel
mcuboot_kernel

or equivalently;

set(IMAGE '')
${IMAGE}kernel

set(IMAGE mcuboot_)
${IMAGE}kernel

where IMAGE is set by boilerplate.cmake to the empty string '' for the
first executable that is processed and then later on to 'mcuboot_' for
the mcuboot image.

or also equivalently;

set(IMAGE '')
set(KERNEL_LIBRARY ${IMAGE}kernel)
${KERNEL_LIBRARY}

set(IMAGE mcuboot_)
set(KERNEL_LIBRARY ${IMAGE}kernel)
${KERNEL_LIBRARY}

here, the user of the kernel library does not need to know that the
library name depends on the IMAGE, only that it should refer to the
library name indirectly through the variable KERNEL_LIBRARY.

The same applies to other globals, like targets, and global
properties.

This
fixes #7868
fixes #19918

Challenges with this approach

This does not support multi-toolchain builds. This is due to a limitation in CMake. Only
one program can be set to be the active C toolchain.

It is not clear how to eventually solve this use-case. One option could be to declare it
as out-of-scope for the build system, and instruct users to do automatic or manual recursive
make for each toolchain.

Another option could be to introduce a single delegator program that CMake believes to
be the C toolchain, which again delegates to each enabled toolchain when invoked by ninja.
Similair to how west delegates to ninja, ninja delegates to ccache, ccache delegates to gcc,
and gcc delegates to ld.

To test

sanitycheck -T$ZEPHYR_BASE/samples/subsys/ipc/openamp

TODO:

carlescufi · 2019-02-23T23:11:21Z

@SebastianBoe thanks for the PR. I would like to understand first what the issue with "recursive" (i.e. using the same framework unchanged multiple times is, could you please briefly describe the problems with that approach?

SebastianBoe · 2019-02-25T16:37:30Z

I would like to understand first what the issue with "recursive" (i.e. using the same framework unchanged multiple times is, could you please briefly describe the problems with that approach?

The simple answer is that non-recursive building is a
best-practice within designing build systems[0].

The more concrete answer is that non-recursive builds will be
both faster and easier for users to interact with.

Our goal is to produce multiple elf files, or images, and to give
the user an easy way to get them signed and flashed on to a
device.

There are two ways to do multi-image builds with Zephyr. Either
have CMake execute the same build scripts multiple times, or
execute CMake multiple times on the same build scripts.

If not familiar with CMake, then imagine the difference between
having python run the same code multiple times vs. having python
invoke another python interpreter.

All mainstream build systems use the same model internally, you
first construct/configure a graph (or DAG) of dependencies
between files, and then use this data structure to quickly
determine which files need to be re-generated to incrementally
build the file that is being requested. In addition to speeding
up incremental builds, the data structure is also used to
determine which files can be built in parallel, and speeds up the
build through parallel execution.

Meaning, this data structure is important for the speed of the
build, and therefore also it's usability. Slow tools hamper
productivity.

The most important difference between "recursive make" and
non-recursive make is that it determines whether you have a
single DAG or multiple DAGs in your build. Having multiple DAGs
is inherently slower than a single DAG solution because of how
dependencies are resolved.

In a multi-DAG build there would be a parent DAG, and one or more
children DAGs that can produce files for the parent DAG. This
relationship can be recursive, creating a monstrously slow DAG of
DAGs. The parent DAG does not know which source files could cause
a rebuild in the child DAG, so it will usually be set up to
always invoke the child DAG to check. This is inherently slower
than checking all the dependencies from a single process/DAG.

There also exists a performance problem where the same child DAG
will have to be re-evaluated multiple times. Imagine that the
parent DAG has a dependency on two of the files in the child
DAG. The first file has unknown dependencies, but is known to not
depend on any of the parent's files. The dependencies of this
file would be resolved as described above, by always assuming it
might be out-of-date and invoking the child dependency. Now
assume the second file depends on a file that is generated by the
parent DAG. This would require invoking the
same child DAG multiple times, once for the first file, and
another time for the second file. Now we are duplicating work,
re-parsing the same DAG multiple times, which will be inherently
slower than parsing an aggregated DAG once.

In addition to a myriad of performance problems related to
resolving dependencies there is also an issue with duplicating
Configure-time work by each CMake process. Imagine that there is
some work done in Configure-time that is slow, and the result is
not expected to change often, like determining the location of
some host tool. Usually, in CMake, this performance problem is
solved by caching the result in the CMakeCache.txt, but when
there are multiple caches this optimization is not as effective.

There is also the matter of controlling the number of running threads.
When there is only one instance of Ninja it is trivial for Ninja to control
the number of threads running, but when there are multiple instances
a shared jobserver would be required. Developing a jobserver in python
would add complexity compared to using Ninja's jobserver and it is not
clear how or if one could achieve a comparable performance.

Creating dependencies between images would also be more complex,
requiring the creation of wrapper targets and commands that invoke ninja
on the other image.

When build issues arise it is more difficult to debug a multi-DAG system than
a single DAG because ninja has tooling for debugging it's DAG, but there does
not exist any multi-ninja instance debug tooling. Consider for instance what would
happen if one had a cycle that spanned across multiple DAGs instead of just internally
within a single DAG. Ninja can easily report the presence of a
cycle within it's DAG, but there does not exist any tooling that
can detect cross-DAG cycles. These cycles will have an obscure
error (infinite recursion), and be difficult to recognize,
analyze, and debug compared to a single-DAG cycle.

It also becomes more difficult to share information between
images. For instance, information that is known at configure-time
can not be shared between images at generation time with
generator expressions. Prototyping has shown that sharing
information at generation time is useful to resolve partitioning
requirements.

In addition to performance considerations there is a matter of
usability. In the non-recursive build the user will only have one
generated build system to interface with, e.g. he could do:

ninja help # User discovers available targets

ninja mcuboot_ram_report # User retrieves a report for the MCUBoot image
ninja         ram_report # User retrieves a report for toplevel image

ninja mcuboot_menuconfig # User configures the MCUBoot image
ninja         menuconfig # User configures the toplevel image

Whereas with a recursive build the user would have to somehow
discover where in the build directory the generated build system
for the child image is located and then change the working
directory appropriately.

In addition to an awkward target interface it is not clear how
one would propagate flags given to the parent CMake/Ninja on
to all CMake/Ninja instances.

And I'm sure there are many other issues that I am not aware
of. Which is why I believe it is best to stick to the established best
practice of non-recursive building.

[0] Miller, P.A. (1998), Recursive Make Considered Harmful,
Journal of AUUG Inc.

tejlmand · 2019-02-27T12:28:24Z

For this to work out-of-the box, an update will be needed to west.yml

If you add the following to west.yml

  <in section [remotes]>
    - name: mcuboot
       url-base: https://github.com/JuulLabs-OSS

  <in section [projects]>
  -  name: mcuboot
      remote: mcuboot
      revision: pull/430/head

That will allow people to checkout this PR, and simply do a west update which will fetch JuulLabs-OSS/mcuboot#430. automatically.

And then when JuulLabs-OSS/mcuboot#430. is merged, then update this PR to the corresponding SHA in mcuboot.

SebastianBoe · 2019-02-28T10:34:51Z

For this to work out-of-the box, an update will be needed to west.yml

Done

Add support for non-recursive single-toolchain multi-image builds. This resolves zephyrproject-rtos#7868. We are seeing the need to boot Zephyr applications in multiple stages, a real-world usecase of even a 3-stage bootloader has been identified and tested. Each bootloader needs to be individually upgrade-able and have it's own configurations of Zephyr libraries. To achieve this each bootloader is organized as it's own executable. The problem is that the build system can only build one executable. This creates a usability issue as the user must invoke a large set of arcane commands to build, sign, and flash each bootloader. To resolve this we re-organize the build system such that it can build multiple executables. Signed-off-by: Sebastian Bøe <[email protected]> Signed-off-by: Håkon Øye Amundsen <[email protected]>

Port the openamp sample to use multi-image. Signed-off-by: Sebastian Bøe <[email protected]>

Note: This was originally submitted upstream in zephyrproject-rtos#13672 and was rejected. NCS is carrying it as a noup for now. Another, more upstream-friendly approach will be needed. We are seeing the need to boot Zephyr applications in multiple stages, a real-world use-case of even a 3-stage bootloader has been identified and tested. Each bootloader needs to be individually upgrade-able and have it's own configurations of Zephyr libraries. To achieve this each bootloader is organized as it's own executable. The problem is that the build system can only build one executable. This creates a usability issue as the user must invoke a large set of arcane commands to build, sign, and flash each bootloader. To resolve this we re-organize the build system such that it can build multiple executables. To work within CMake restrictions that logical target names must be globally unique (among other naming restrictions), we add an IMAGE variable which is used to name the current Zephyr application being built. Signed-off-by: Sebastian Bøe <[email protected]> Signed-off-by: Håkon Øye Amundsen <[email protected]> Signed-off-by: Øyvind Rønningstad <[email protected]> Signed-off-by: Robert Lubos <[email protected]> Signed-off-by: Ruth Fuchss <[email protected]> (cherry picked from commit 3db22ac) (Conflicts were resolved in lib/posix/CMakeLists.txt. Changes to ia32.cmake were dropped entirely since Intel has rejected this approach. Finally, these follow-up fixes and dependent patches were squashed in: 02eeeb7 c6c3fed 56f6a33 1dc0ca5 6e348b8) Signed-off-by: Marti Bolivar <[email protected]> (cherry picked from commit 1e0a3dc) (Conflict resolved in cmake/app/boilerplate.cmake) Signed-off-by: Robert Lubos <[email protected]>

From upstream zephyrproject-rtos#13672 Added a minimal integration test for MCUBoot. It has just enough coverage to catch build issues. Which is better than before (no coverage in the Zephyr CI). Signed-off-by: Sebastian Bøe <[email protected]> (cherry picked from commit 1937b41) (cherry picked from commit 026bd21) (cherry picked from commit 4b3a15b) Signed-off-by: Martí Bolívar <[email protected]>

Note: This was originally submitted upstream in zephyrproject-rtos#13672 and was rejected. NCS is carrying it as a noup for now. Another, more upstream-friendly approach will be needed. We are seeing the need to boot Zephyr applications in multiple stages, a real-world use-case of even a 3-stage bootloader has been identified and tested. Each bootloader needs to be individually upgrade-able and have it's own configurations of Zephyr libraries. To achieve this each bootloader is organized as it's own executable. The problem is that the build system can only build one executable. This creates a usability issue as the user must invoke a large set of arcane commands to build, sign, and flash each bootloader. To resolve this we re-organize the build system such that it can build multiple executables. To work within CMake restrictions that logical target names must be globally unique (among other naming restrictions), we add an IMAGE variable which is used to name the current Zephyr application being built. Signed-off-by: Sebastian Bøe <[email protected]> Signed-off-by: Håkon Øye Amundsen <[email protected]> Signed-off-by: Øyvind Rønningstad <[email protected]> Signed-off-by: Robert Lubos <[email protected]> Signed-off-by: Ruth Fuchss <[email protected]> (cherry picked from commit 3db22ac) (Conflicts were resolved in lib/posix/CMakeLists.txt. Changes to ia32.cmake were dropped entirely since Intel has rejected this approach. Finally, these follow-up fixes and dependent patches were squashed in: 02eeeb7 c6c3fed 56f6a33 1dc0ca5 6e348b8) Signed-off-by: Marti Bolivar <[email protected]> (cherry picked from commit 1e0a3dc) (Conflict resolved in cmake/app/boilerplate.cmake) Signed-off-by: Robert Lubos <[email protected]> (cherry picked from commit 439a1ad) Signed-off-by: Martí Bolívar <[email protected]>

SebastianBoe · 2019-11-06T08:35:49Z

Closing permanently.

Upstream did not accept a solution that increased the complexity of the build system in this manner.

Note: This was originally submitted upstream in zephyrproject-rtos#13672 and was rejected. NCS is carrying it as a noup for now. Another, more upstream-friendly approach will be needed. We are seeing the need to boot Zephyr applications in multiple stages, a real-world use-case of even a 3-stage bootloader has been identified and tested. Each bootloader needs to be individually upgrade-able and have it's own configurations of Zephyr libraries. To achieve this each bootloader is organized as it's own executable. The problem is that the build system can only build one executable. This creates a usability issue as the user must invoke a large set of arcane commands to build, sign, and flash each bootloader. To resolve this we re-organize the build system such that it can build multiple executables. To work within CMake restrictions that logical target names must be globally unique (among other naming restrictions), we add an IMAGE variable which is used to name the current Zephyr application being built. Signed-off-by: Sebastian Bøe <[email protected]> Signed-off-by: Håkon Øye Amundsen <[email protected]> Signed-off-by: Øyvind Rønningstad <[email protected]> Signed-off-by: Robert Lubos <[email protected]> Signed-off-by: Ruth Fuchss <[email protected]> (cherry picked from commit 3db22ac) (Conflicts were resolved in lib/posix/CMakeLists.txt. Changes to ia32.cmake were dropped entirely since Intel has rejected this approach. Finally, these follow-up fixes and dependent patches were squashed in: 02eeeb7 c6c3fed 56f6a33 1dc0ca5 6e348b8) Signed-off-by: Marti Bolivar <[email protected]> (cherry picked from commit 1e0a3dc) (Conflict resolved in cmake/app/boilerplate.cmake) Signed-off-by: Robert Lubos <[email protected]> (cherry picked from commit 439a1ad) Signed-off-by: Martí Bolívar <[email protected]> (cherry picked from commit bbcbd8f) Signed-off-by: Dominik Ermel <[email protected]>

SebastianBoe requested review from andrewboie, andyross, carlescufi, cvinayak, jhedberg, MaureenHelm, nashif, pfalcon, ramakrishnapallala, sjanc, thoh-ot and Vudentz as code owners February 22, 2019 13:05

hakonfam force-pushed the multi_image_with_mcuboot_support branch from c60bad2 to 33bde05 Compare February 22, 2019 13:20

SebastianBoe mentioned this pull request Feb 22, 2019

[WIP] Multi image building mcu-tools/mcuboot#430

Closed

SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from 33bde05 to 67c414d Compare February 22, 2019 14:02

This comment has been minimized.

Sign in to view

SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from 67c414d to 90e12b3 Compare February 22, 2019 14:50

SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from 90e12b3 to b30b83a Compare February 28, 2019 09:55

SebastianBoe requested a review from dbkinder as a code owner March 1, 2019 10:49

SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from 91ac66e to c3185b0 Compare March 1, 2019 11:07

SebastianBoe added area: Build System Maintainer At least 2 days before merge, maintainer review required labels Mar 1, 2019

SebastianBoe force-pushed the multi_image_with_mcuboot_support branch 4 times, most recently from 5c549e5 to 682ed49 Compare March 1, 2019 12:41

jukkar removed request for nategraff-sifive, pgielda, trond-snekvik, finikorg, jfischer-no, jukkar and tbursztyka October 10, 2019 13:06

SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from c47cf7e to 36a4e8a Compare October 10, 2019 13:36

SebastianBoe requested a review from mike-scott as a code owner October 10, 2019 13:36

SebastianBoe force-pushed the multi_image_with_mcuboot_support branch 2 times, most recently from 95cab84 to e5c819d Compare October 14, 2019 10:33

SebastianBoe requested review from mbolivar and removed request for wentongwu and a user October 14, 2019 11:18

vonhust mentioned this pull request Oct 15, 2019

samples/boards/arc_secure_services fails on nsim_sem #19643

Closed

SebastianBoe added 2 commits October 17, 2019 14:54

samples: openamp: Port sample to use multi-image

1f54601

Port the openamp sample to use multi-image. Signed-off-by: Sebastian Bøe <[email protected]>

SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from e5c819d to 1f54601 Compare October 22, 2019 10:18

SebastianBoe requested a review from a user October 22, 2019 10:18

carlescufi mentioned this pull request Oct 23, 2019

Board flash partitions can't be tested in tree #20070

Open

mike-scott removed their request for review November 4, 2019 22:08

SebastianBoe closed this Nov 6, 2019

SebastianBoe mentioned this pull request Nov 6, 2019

cmake: port the CMakeLists.txt code to support multi-image. zephyrproject-rtos/nffs#1

Closed

mbolivar mentioned this pull request Dec 6, 2019

Remove unnecessary CMake target indirection #21230

Open

mbolivar-nordic mentioned this pull request Nov 1, 2022

Better support for multi-core AMP SoCs #51833

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for non-recursive single-toolchain multi-image builds #13672

Add support for non-recursive single-toolchain multi-image builds #13672

SebastianBoe commented Feb 22, 2019 •

edited

Loading

This comment has been minimized.

carlescufi commented Feb 23, 2019

SebastianBoe commented Feb 25, 2019 •

edited

Loading

tejlmand commented Feb 27, 2019

SebastianBoe commented Feb 28, 2019

SebastianBoe commented Nov 6, 2019

Add support for non-recursive single-toolchain multi-image builds #13672

Add support for non-recursive single-toolchain multi-image builds #13672

Conversation

SebastianBoe commented Feb 22, 2019 • edited Loading

Challenges with this approach

To test

This comment has been minimized.

carlescufi commented Feb 23, 2019

SebastianBoe commented Feb 25, 2019 • edited Loading

tejlmand commented Feb 27, 2019

SebastianBoe commented Feb 28, 2019

SebastianBoe commented Nov 6, 2019

SebastianBoe commented Feb 22, 2019 •

edited

Loading

SebastianBoe commented Feb 25, 2019 •

edited

Loading