Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmake compiler cache failures in CI #9992

Closed
nashif opened this issue Sep 14, 2018 · 5 comments · Fixed by #10481
Closed

cmake compiler cache failures in CI #9992

nashif opened this issue Sep 14, 2018 · 5 comments · Fixed by #10481
Assignees
Labels
area: Build System bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@nashif
Copy link
Member

nashif commented Sep 14, 2018

see https://app.shippable.com/github/zephyrproject-rtos/zephyr/runs/22294/4/console

-- Performing Test toolchain_is_ok - Success
CMake Error at /home/buildslave/.cache/zephyr/ToolchainCapabilityDatabase.cmake:76:
  Parse error.  Function missing ending ")".  End of file reached.
Call Stack (most recent call first):
  ../../../cmake/extensions.cmake:638 (include)
  ../../../cmake/extensions.cmake:1020 (zephyr_check_compiler_flag)
  ../../../cmake/extensions.cmake:996 (target_cc_option_fallback)
  ../../../cmake/extensions.cmake:106 (target_cc_option)
  ../../../CMakeLists.txt:212 (zephyr_cc_option)


@nashif nashif added the bug The issue is a bug, or the PR is fixing a bug label Sep 14, 2018
@nashif nashif added the priority: low Low impact/importance bug label Sep 14, 2018
@SebastianBoe
Copy link
Collaborator

Will get this fixed. Any information / half-baked-theories would be useful ...

@SebastianBoe
Copy link
Collaborator

SebastianBoe commented Sep 17, 2018

Possible causes:

  • The process that writes to the cache is killed mid-line
  • Some cache entries are not parsable as set(a b) code.
  • The process that reads the cache does so while another process is mid-line in a write

@nashif
Copy link
Member Author

nashif commented Sep 20, 2018

@SebastianBoe
Copy link
Collaborator

I see from those logs that there is a new symptom (or a new root cause). CMake runs the command

  file(
    APPEND
    ${ZEPHYR_TOOLCHAIN_CAPABILITY_CACHE}
    "set(${key} ${inner_check})\n"
    )

and fails with the error:

CMake Error at ../../../../cmake/extensions.cmake:671 (file):
  file failed to open for writing (No such file or directory):

    /home/buildslave/.cache/zephyr/ToolchainCapabilityDatabase.cmake
Call Stack (most recent call first):
  ../../../../cmake/extensions.cmake:1020 (zephyr_check_compiler_flag)
  ../../../../cmake/extensions.cmake:996 (target_cc_option_fallback)
  ../../../../cmake/extensions.cmake:106 (target_cc_option)
  ../../../../CMakeLists.txt:210 (zephyr_cc_option)

I cannot yet explain how this can happen. But I can say that it is not a permission issue, as I have tested what happens when you have insufficient permissions, and the error message is different (Permission denied). Note that file(APPEND should create the file when it doesn't exist, so there is no user-error here.

Will continue investigating ...

@SebastianBoe
Copy link
Collaborator

Have added more debug information in PR #11027 . Will investigate when it triggers with the same symptoms again (AFAICT it has not triggered for a week now).

SebastianBoe added a commit to SebastianBoe/zephyr that referenced this issue Oct 10, 2018
CI sometimes fails with a temporarily corrupted toolchain capabilitiy
database file. Although not proven, there is evidence that CMake's
file(APPEND does not work atomically when there are concurrent writes
and reads of a certain size.

To avoid file appending, we re-write the key-value database
implementation to store keys in filenames and values in individual
single-byte files.

This is (most likely) fixes zephyrproject-rtos#9992.

NB: Users that have been overriding the database file location with
the CMake variable ZEPHYR_TOOLCHAIN_CAPABILITY_CACHE must now specify
a directory with the variable ZEPHYR_TOOLCHAIN_CAPABILITY_CACHE_DIR.

Signed-off-by: Sebastian Bøe <[email protected]>
nashif pushed a commit that referenced this issue Oct 19, 2018
CI sometimes fails with a temporarily corrupted toolchain capabilitiy
database file. Although not proven, there is evidence that CMake's
file(APPEND does not work atomically when there are concurrent writes
and reads of a certain size.

To avoid file appending, we re-write the key-value database
implementation to store keys in filenames and values in individual
single-byte files.

This is (most likely) fixes #9992.

NB: Users that have been overriding the database file location with
the CMake variable ZEPHYR_TOOLCHAIN_CAPABILITY_CACHE must now specify
a directory with the variable ZEPHYR_TOOLCHAIN_CAPABILITY_CACHE_DIR.

Signed-off-by: Sebastian Bøe <[email protected]>
marc-hb added a commit to marc-hb/zephyr that referenced this issue Apr 2, 2019
While this race seems unlikely and harmless let's play it safe and
implement the usual solution: temporary file + atomic rename.

The race is unlikely and maybe even harmless because:

- Only sanitycheck seems to invoke cmake concurrently.
- Users rarely delete their ~/.cache/zephyr/ToolchainCapabilityDatabase/
- All concurrent cmake processes write the same, single byte to the same
  files.
- Creating a single byte is at least very fast, so extremely short
  window for others to read an empty file.

For additional background see links in issue zephyrproject-rtos#9992

Signed-off-by: Marc Herbert <[email protected]>
nashif pushed a commit that referenced this issue Apr 17, 2019
While this race seems unlikely and harmless let's play it safe and
implement the usual solution: temporary file + atomic rename.

The race is unlikely and maybe even harmless because:

- Only sanitycheck seems to invoke cmake concurrently.
- Users rarely delete their ~/.cache/zephyr/ToolchainCapabilityDatabase/
- All concurrent cmake processes write the same, single byte to the same
  files.
- Creating a single byte is at least very fast, so extremely short
  window for others to read an empty file.

For additional background see links in issue #9992

Signed-off-by: Marc Herbert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Build System bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants