Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for non-recursive single-toolchain multi-image builds #13672

Conversation

SebastianBoe
Copy link
Collaborator

@SebastianBoe SebastianBoe commented Feb 22, 2019

We are seeing the need to boot Zephyr applications in multiple
stages, a real-world use-case of even a 3-stage bootloader has
been identified and tested.

Each bootloader needs to be individually upgrade-able and have
it's own configurations of Zephyr libraries. To achieve this each
bootloader is organized as it's own executable.

The problem is that the build system can only build one
executable. This creates a usability issue as the user must
invoke a large set of arcane commands to build, sign, and flash
each bootloader.

It also causes a problem for the configuration system. If the build and configuration
system can only build one image it will not be able to automatically share configuration
between images or enforce that one image is configured to be compatible with another
image. The user must instead manually keep each image's configuration in sync.

To resolve this we re-organize the build system such that it can build
multiple executables.

For the user, building both a bootloader and an application now looks
like:

# Build and flash both MCUboot and the signed child image hello_world:
west update
cmake -H$ZEPHYR_BASE/samples/hello_world -DCONFIG_BOOTLOADER_MCUBOOT=y -Bbuild
ninja -C build flash

In contrast to the 8 or so commands that was needed previously, or the
five commands that are needed with west:
#12903 (comment)

Internally, the build system uses two mechanisms to prevent the
images from conflicting with each other. Firstly, each image is
executed in it's own 'subdirectory', created with
'add_subdirectory'. This ensures that variables set in a
subdirectory (image 3) does not corrupt the variables set in the
parent directory (image 2).

e.g. in the code snippet below:

set(FOO 0)

add_subdirectory(child_image child_image_build_dir)

print(FOO)

Due to CMake scoping rules, the variable FOO will be '0' after
'add_subdirectory' has been processed, even if the subdirectory
temporarily set FOO to something else.

When scoping rules can not be used to prevent aliasing issues between
the images, e.g. for global symbols like target names, global
properties, and CACHE variables, a disambiguating prefix is used in
the symbol name.

Meaning, instead of using the library name 'kernel', we would for an
MCUBoot-enabled application have two libraries, 'kernel' and
'mcuboot_kernel'. When writing build scripts that refers to the two
kernel libraries names, one would use;

kernel
mcuboot_kernel

or equivalently;

set(IMAGE '')
${IMAGE}kernel

set(IMAGE mcuboot_)
${IMAGE}kernel

where IMAGE is set by boilerplate.cmake to the empty string '' for the
first executable that is processed and then later on to 'mcuboot_' for
the mcuboot image.

or also equivalently;

set(IMAGE '')
set(KERNEL_LIBRARY ${IMAGE}kernel)
${KERNEL_LIBRARY}

set(IMAGE mcuboot_)
set(KERNEL_LIBRARY ${IMAGE}kernel)
${KERNEL_LIBRARY}

here, the user of the kernel library does not need to know that the
library name depends on the IMAGE, only that it should refer to the
library name indirectly through the variable KERNEL_LIBRARY.

The same applies to other globals, like targets, and global
properties.

This
fixes #7868
fixes #19918

Challenges with this approach

This does not support multi-toolchain builds. This is due to a limitation in CMake. Only
one program can be set to be the active C toolchain.

It is not clear how to eventually solve this use-case. One option could be to declare it
as out-of-scope for the build system, and instruct users to do automatic or manual recursive
make for each toolchain.

Another option could be to introduce a single delegator program that CMake believes to
be the C toolchain, which again delegates to each enabled toolchain when invoked by ninja.
Similair to how west delegates to ninja, ninja delegates to ccache, ccache delegates to gcc,
and gcc delegates to ld.

To test

sanitycheck -T$ZEPHYR_BASE/samples/subsys/ipc/openamp

TODO:

  • Update documentation
  • Cleanup git history
  • port all libraries
  • port all global properties
  • port MCUBoot
  • port all CACHE variables
  • Have build scripts use the 'west' supplied MCUBoot path
  • Add a regression test using MCUBoot
  • fix the 'set_conf_file' mechanism (Deprecate set_conf_file #13948)
  • Fix native_posix
  • Fix shields
  • Implement mixing source-based and binary-based images (for developing a DFU-able patch)
    • Support building child images
    • Support pre-built child images
    • Support omitting child images
  • Update check_compliance.py
  • Add the IMAGE prefix to the user-input CMake variables that are image-specific
  • Document when the IMAGE prefix should be used.
  • Port the OpenAMP sample
  • Port west to support flashing of individual images and all images in succession
  • Port west to support debug, attach and debugserver
  • Develop a proof-of-concept implementation of multi-toolchain support to prove that the build-time performance is acceptable.
  • Image'ify the runner-related variables
  • Rebase

@zephyrbot

This comment has been minimized.

@SebastianBoe SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from 67c414d to 90e12b3 Compare February 22, 2019 14:50
@carlescufi
Copy link
Member

@SebastianBoe thanks for the PR. I would like to understand first what the issue with "recursive" (i.e. using the same framework unchanged multiple times is, could you please briefly describe the problems with that approach?

@SebastianBoe
Copy link
Collaborator Author

SebastianBoe commented Feb 25, 2019

I would like to understand first what the issue with "recursive" (i.e. using the same framework unchanged multiple times is, could you please briefly describe the problems with that approach?

The simple answer is that non-recursive building is a
best-practice within designing build systems[0].

The more concrete answer is that non-recursive builds will be
both faster and easier for users to interact with.

Our goal is to produce multiple elf files, or images, and to give
the user an easy way to get them signed and flashed on to a
device.

There are two ways to do multi-image builds with Zephyr. Either
have CMake execute the same build scripts multiple times, or
execute CMake multiple times on the same build scripts.

If not familiar with CMake, then imagine the difference between
having python run the same code multiple times vs. having python
invoke another python interpreter.

All mainstream build systems use the same model internally, you
first construct/configure a graph (or DAG) of dependencies
between files, and then use this data structure to quickly
determine which files need to be re-generated to incrementally
build the file that is being requested. In addition to speeding
up incremental builds, the data structure is also used to
determine which files can be built in parallel, and speeds up the
build through parallel execution.

Meaning, this data structure is important for the speed of the
build, and therefore also it's usability. Slow tools hamper
productivity.

The most important difference between "recursive make" and
non-recursive make is that it determines whether you have a
single DAG or multiple DAGs in your build. Having multiple DAGs
is inherently slower than a single DAG solution because of how
dependencies are resolved.

In a multi-DAG build there would be a parent DAG, and one or more
children DAGs that can produce files for the parent DAG. This
relationship can be recursive, creating a monstrously slow DAG of
DAGs. The parent DAG does not know which source files could cause
a rebuild in the child DAG, so it will usually be set up to
always invoke the child DAG to check. This is inherently slower
than checking all the dependencies from a single process/DAG.

There also exists a performance problem where the same child DAG
will have to be re-evaluated multiple times. Imagine that the
parent DAG has a dependency on two of the files in the child
DAG. The first file has unknown dependencies, but is known to not
depend on any of the parent's files. The dependencies of this
file would be resolved as described above, by always assuming it
might be out-of-date and invoking the child dependency. Now
assume the second file depends on a file that is generated by the
parent DAG. This would require invoking the
same child DAG multiple times, once for the first file, and
another time for the second file. Now we are duplicating work,
re-parsing the same DAG multiple times, which will be inherently
slower than parsing an aggregated DAG once.

In addition to a myriad of performance problems related to
resolving dependencies there is also an issue with duplicating
Configure-time work by each CMake process. Imagine that there is
some work done in Configure-time that is slow, and the result is
not expected to change often, like determining the location of
some host tool. Usually, in CMake, this performance problem is
solved by caching the result in the CMakeCache.txt, but when
there are multiple caches this optimization is not as effective.

There is also the matter of controlling the number of running threads.
When there is only one instance of Ninja it is trivial for Ninja to control
the number of threads running, but when there are multiple instances
a shared jobserver would be required. Developing a jobserver in python
would add complexity compared to using Ninja's jobserver and it is not
clear how or if one could achieve a comparable performance.

Creating dependencies between images would also be more complex,
requiring the creation of wrapper targets and commands that invoke ninja
on the other image.

When build issues arise it is more difficult to debug a multi-DAG system than
a single DAG because ninja has tooling for debugging it's DAG, but there does
not exist any multi-ninja instance debug tooling. Consider for instance what would
happen if one had a cycle that spanned across multiple DAGs instead of just internally
within a single DAG. Ninja can easily report the presence of a
cycle within it's DAG, but there does not exist any tooling that
can detect cross-DAG cycles. These cycles will have an obscure
error (infinite recursion), and be difficult to recognize,
analyze, and debug compared to a single-DAG cycle.

It also becomes more difficult to share information between
images. For instance, information that is known at configure-time
can not be shared between images at generation time with
generator expressions. Prototyping has shown that sharing
information at generation time is useful to resolve partitioning
requirements.

In addition to performance considerations there is a matter of
usability. In the non-recursive build the user will only have one
generated build system to interface with, e.g. he could do:

ninja help # User discovers available targets

ninja mcuboot_ram_report # User retrieves a report for the MCUBoot image
ninja         ram_report # User retrieves a report for toplevel image

ninja mcuboot_menuconfig # User configures the MCUBoot image
ninja         menuconfig # User configures the toplevel image

Whereas with a recursive build the user would have to somehow
discover where in the build directory the generated build system
for the child image is located and then change the working
directory appropriately.

In addition to an awkward target interface it is not clear how
one would propagate flags given to the parent CMake/Ninja on
to all CMake/Ninja instances.

And I'm sure there are many other issues that I am not aware
of. Which is why I believe it is best to stick to the established best
practice of non-recursive building.

[0] Miller, P.A. (1998), Recursive Make Considered Harmful,
Journal of AUUG Inc.

@tejlmand
Copy link
Collaborator

For this to work out-of-the box, an update will be needed to west.yml

If you add the following to west.yml

  <in section [remotes]>
    - name: mcuboot
       url-base: https://github.com/JuulLabs-OSS

  <in section [projects]>
  -  name: mcuboot
      remote: mcuboot
      revision: pull/430/head

That will allow people to checkout this PR, and simply do a west update which will fetch JuulLabs-OSS/mcuboot#430. automatically.

And then when JuulLabs-OSS/mcuboot#430. is merged, then update this PR to the corresponding SHA in mcuboot.

@SebastianBoe SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from 90e12b3 to b30b83a Compare February 28, 2019 09:55
@SebastianBoe
Copy link
Collaborator Author

For this to work out-of-the box, an update will be needed to west.yml

Done

@SebastianBoe SebastianBoe requested a review from dbkinder as a code owner March 1, 2019 10:49
@SebastianBoe SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from 91ac66e to c3185b0 Compare March 1, 2019 11:07
@SebastianBoe SebastianBoe added area: Build System Maintainer At least 2 days before merge, maintainer review required labels Mar 1, 2019
@SebastianBoe SebastianBoe force-pushed the multi_image_with_mcuboot_support branch 4 times, most recently from 5c549e5 to 682ed49 Compare March 1, 2019 12:41
@SebastianBoe SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from c47cf7e to 36a4e8a Compare October 10, 2019 13:36
@SebastianBoe SebastianBoe force-pushed the multi_image_with_mcuboot_support branch 2 times, most recently from 95cab84 to e5c819d Compare October 14, 2019 10:33
@SebastianBoe SebastianBoe requested review from mbolivar and removed request for wentongwu and a user October 14, 2019 11:18
Add support for non-recursive single-toolchain multi-image
builds. This resolves zephyrproject-rtos#7868.

We are seeing the need to boot Zephyr applications in multiple
stages, a real-world usecase of even a 3-stage bootloader has
been identified and tested.

Each bootloader needs to be individually upgrade-able and have
it's own configurations of Zephyr libraries. To achieve this each
bootloader is organized as it's own executable.

The problem is that the build system can only build one
executable. This creates a usability issue as the user must invoke a
large set of arcane commands to build, sign, and flash each
bootloader.

To resolve this we re-organize the build system such that it can build
multiple executables.

Signed-off-by: Sebastian Bøe <[email protected]>
Signed-off-by: Håkon Øye Amundsen <[email protected]>
Port the openamp sample to use multi-image.

Signed-off-by: Sebastian Bøe <[email protected]>
SebastianBoe added a commit to SebastianBoe/zephyr that referenced this pull request Oct 18, 2019
Note: This was originally submitted upstream in zephyrproject-rtos#13672 and was
      rejected. NCS is carrying it as a noup for now. Another,
      more upstream-friendly approach will be needed.

We are seeing the need to boot Zephyr applications in multiple
stages, a real-world use-case of even a 3-stage bootloader has
been identified and tested.

Each bootloader needs to be individually upgrade-able and have
it's own configurations of Zephyr libraries. To achieve this each
bootloader is organized as it's own executable.

The problem is that the build system can only build one
executable. This creates a usability issue as the user must invoke a
large set of arcane commands to build, sign, and flash each
bootloader.

To resolve this we re-organize the build system such that it can build
multiple executables. To work within CMake restrictions that logical
target names must be globally unique (among other naming
restrictions), we add an IMAGE variable which is used to name the
current Zephyr application being built.

Signed-off-by: Sebastian Bøe <[email protected]>
Signed-off-by: Håkon Øye Amundsen <[email protected]>
Signed-off-by: Øyvind Rønningstad <[email protected]>
Signed-off-by: Robert Lubos <[email protected]>
Signed-off-by: Ruth Fuchss <[email protected]>

(cherry picked from commit 3db22ac)
(Conflicts were resolved in lib/posix/CMakeLists.txt. Changes to
 ia32.cmake were dropped entirely since Intel has rejected this
 approach.

 Finally, these follow-up fixes and dependent patches were squashed
 in:

  02eeeb7
  c6c3fed
  56f6a33
  1dc0ca5
  6e348b8)
Signed-off-by: Marti Bolivar <[email protected]>

(cherry picked from commit 1e0a3dc)
(Conflict resolved in cmake/app/boilerplate.cmake)
Signed-off-by: Robert Lubos <[email protected]>
@SebastianBoe SebastianBoe force-pushed the multi_image_with_mcuboot_support branch from e5c819d to 1f54601 Compare October 22, 2019 10:18
@SebastianBoe SebastianBoe requested a review from a user October 22, 2019 10:18
joerchan pushed a commit to joerchan/zephyr that referenced this pull request Oct 30, 2019
From upstream zephyrproject-rtos#13672

Added a minimal integration test for MCUBoot. It has just enough
coverage to catch build issues. Which is better than before (no
coverage in the Zephyr CI).

Signed-off-by: Sebastian Bøe <[email protected]>
(cherry picked from commit 1937b41)
(cherry picked from commit 026bd21)
(cherry picked from commit 4b3a15b)
Signed-off-by: Martí Bolívar <[email protected]>
joerchan pushed a commit to joerchan/zephyr that referenced this pull request Oct 30, 2019
Note: This was originally submitted upstream in zephyrproject-rtos#13672 and was
      rejected. NCS is carrying it as a noup for now. Another,
      more upstream-friendly approach will be needed.

We are seeing the need to boot Zephyr applications in multiple
stages, a real-world use-case of even a 3-stage bootloader has
been identified and tested.

Each bootloader needs to be individually upgrade-able and have
it's own configurations of Zephyr libraries. To achieve this each
bootloader is organized as it's own executable.

The problem is that the build system can only build one
executable. This creates a usability issue as the user must invoke a
large set of arcane commands to build, sign, and flash each
bootloader.

To resolve this we re-organize the build system such that it can build
multiple executables. To work within CMake restrictions that logical
target names must be globally unique (among other naming
restrictions), we add an IMAGE variable which is used to name the
current Zephyr application being built.

Signed-off-by: Sebastian Bøe <[email protected]>
Signed-off-by: Håkon Øye Amundsen <[email protected]>
Signed-off-by: Øyvind Rønningstad <[email protected]>
Signed-off-by: Robert Lubos <[email protected]>
Signed-off-by: Ruth Fuchss <[email protected]>

(cherry picked from commit 3db22ac)
(Conflicts were resolved in lib/posix/CMakeLists.txt. Changes to
 ia32.cmake were dropped entirely since Intel has rejected this
 approach.

 Finally, these follow-up fixes and dependent patches were squashed
 in:

  02eeeb7
  c6c3fed
  56f6a33
  1dc0ca5
  6e348b8)
Signed-off-by: Marti Bolivar <[email protected]>

(cherry picked from commit 1e0a3dc)
(Conflict resolved in cmake/app/boilerplate.cmake)
Signed-off-by: Robert Lubos <[email protected]>
(cherry picked from commit 439a1ad)
Signed-off-by: Martí Bolívar <[email protected]>
@mike-scott mike-scott removed their request for review November 4, 2019 22:08
@SebastianBoe
Copy link
Collaborator Author

Closing permanently.

Upstream did not accept a solution that increased the complexity of the build system in this manner.

SebastianBoe added a commit to SebastianBoe/zephyr that referenced this pull request Dec 11, 2019
Note: This was originally submitted upstream in zephyrproject-rtos#13672 and was
      rejected. NCS is carrying it as a noup for now. Another,
      more upstream-friendly approach will be needed.

We are seeing the need to boot Zephyr applications in multiple
stages, a real-world use-case of even a 3-stage bootloader has
been identified and tested.

Each bootloader needs to be individually upgrade-able and have
it's own configurations of Zephyr libraries. To achieve this each
bootloader is organized as it's own executable.

The problem is that the build system can only build one
executable. This creates a usability issue as the user must invoke a
large set of arcane commands to build, sign, and flash each
bootloader.

To resolve this we re-organize the build system such that it can build
multiple executables. To work within CMake restrictions that logical
target names must be globally unique (among other naming
restrictions), we add an IMAGE variable which is used to name the
current Zephyr application being built.

Signed-off-by: Sebastian Bøe <[email protected]>
Signed-off-by: Håkon Øye Amundsen <[email protected]>
Signed-off-by: Øyvind Rønningstad <[email protected]>
Signed-off-by: Robert Lubos <[email protected]>
Signed-off-by: Ruth Fuchss <[email protected]>

(cherry picked from commit 3db22ac)
(Conflicts were resolved in lib/posix/CMakeLists.txt. Changes to
 ia32.cmake were dropped entirely since Intel has rejected this
 approach.

 Finally, these follow-up fixes and dependent patches were squashed
 in:

  02eeeb7
  c6c3fed
  56f6a33
  1dc0ca5
  6e348b8)
Signed-off-by: Marti Bolivar <[email protected]>

(cherry picked from commit 1e0a3dc)
(Conflict resolved in cmake/app/boilerplate.cmake)
Signed-off-by: Robert Lubos <[email protected]>
(cherry picked from commit 439a1ad)
Signed-off-by: Martí Bolívar <[email protected]>
(cherry picked from commit bbcbd8f)
Signed-off-by: Dominik Ermel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Build System Maintainer At least 2 days before merge, maintainer review required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incremental builds broken for OpenAMP sample Support non-recursive single-toolchain multi-image builds