Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logger abi check #3195

Merged
merged 5 commits into from
Aug 18, 2020
Merged

Logger abi check #3195

merged 5 commits into from
Aug 18, 2020

Conversation

ktrzcinx
Copy link
Member

@ktrzcinx ktrzcinx commented Jul 15, 2020

Compatibility between fw and ldc file is checked in logger tool before logs convertion.
This mechanism was comparing sof_ipc_fw_version->abi_version saved in ldc file and
in fw_ready message, but in ldc this field is filled with DBG_ABI and in fw_ready with ABI,
so it wasn't working at all. Moreover FW and LDC file must mach exactly, so comparing ABI
version is too generous approach, it's much better to calculate fw hash for comparison.

The best would be to calculate executable binary hash and save it back to binary file and to ldc file,
then for example changing unlinked code, or comment in source code won't break ldc and firmware compatibility.
But this approach needs quite complex edition of output executable file or double compilation.

Much simpler approach is to hash source code before compilation and save this value in separate file,
then each whitespace has impact on FW hash, but it's shouldn't be real problem. In this approach
git tools can be used to calculate this value, so there no need for any new external tool.

See: #3243

Extracted from discussion:

we have two approaches:

First, hashing whole tracked source code :

Profits:

+Possibility to catch any local change (FW incompatibility with git source code pointed by hash commit)
+Straight forward build process, without additional hooks / build steps - easy to maintain
Drawbacks:

-Every time fw is updated, ldc should be updated too. (ATM compatibility check is optional, but in version where it will be checked by default, there should be flag to force 'no check' for situations like this - just like git hooks works).
Second, hashing only dictionary, and post build fw_ready section edit :

Profits:

+Dictionary compatibility kept after small FW changes (even one line add/delete breaks ldc compatibility because line number related with log message is saved in ldc file)
Drawbacks:

-Post build elf edition - one more obligatory post build step / making SMEX run obligatory and adding to them hashing library or double obligatory SMEX run (first to produce ldc file, then hash it in CMake, and second one SMEX, to fill value in fw_ready section). SMEX may be chosen because atm. SMEX is the only 'external' user of fw_ready section in elf file.

Maybe I missed something?
I choose 1. version because I prefer as simple and clear build-system architecture as it is possible.

Copy link
Collaborator

@paulstelian97 paulstelian97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this still needs to go through the ABI change procedure but the change itself is good.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmaka1 any comments

scripts/cmake/version.cmake Outdated Show resolved Hide resolved
@@ -54,9 +54,10 @@ struct sof_ipc_fw_version {
uint8_t time[10];
uint8_t tag[6];
uint32_t abi_version;
uint32_t fw_hash; /* used to check FW and ldc file compatibility */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from git version hash ?

Copy link
Member Author

@ktrzcinx ktrzcinx Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SOF_TAG refer git commit hash, changed after each commit, so I can edit file without commit, what wouldn't change SOF_TAG but will break ldc file compatibility. fw_hash will change value after each edit of any file tracked in git (answer for following question).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so this need to be made clear in the comments alongside it's reproducability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -32,6 +32,7 @@ const struct ext_man_fw_version ext_man_fw_ver
#endif
.tag = SOF_TAG,
.abi_version = SOF_ABI_VERSION,
.fw_hash = SOF_FW_HASH,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reproducible - i.e are you hashing any generated files that could differ for different users.

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'SOF_TAG' refer git commit hash, changed after each commit, so I can edit file without commit, what wouldn't change 'SOF_TAG' but will break ldc file compatibility. 'fw_hash' will change value after each edit of any file tracked in git (answer for following question).

@ktrzcinx if you want to track uncommitted changes, then this isn't sufficient either. You can add (without committing) a new `#include' to a .c or .h file and then change that new header and since it isn't known to git, it won't be hashed. Maybe you need to get a file list from the compiler? This will then include any system headers used, but that shouldn't be harmful either.

@@ -59,6 +59,27 @@ if(NOT SOF_TAG)
set(SOF_TAG 0)
endif()

# list tracked files from src directory
execute_process(COMMAND git ls-files src/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this any different from just git ls-files stc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice any difference in results, I used src/* to indicate that I want files from some directory. Updated to version with src.

tools/logger/convert.c Outdated Show resolved Hide resolved
@ktrzcinx ktrzcinx force-pushed the logger-abi-check branch 2 times, most recently from 43a7391 to 1e210cf Compare July 20, 2020 14:54
@ktrzcinx
Copy link
Member Author

@ktrzcinx if you want to track uncommitted changes, then this isn't sufficient either. You can add (without committing) a new `#include' to a .c or .h file and then change that new header and since it isn't known to git, it won't be hashed. Maybe you need to get a file list from the compiler? This will then include any system headers used, but that shouldn't be harmful either.

I'm little afraid that hashing system headers may impact hash reproducability - user with different toolchain version perhaps calculate different hash version, what's not an issue with my approach (because only hashing sof-git-tracked files).

Described scenario with `#include' to a .c or .h file actually can be made, but it's quite rare scenario. Please remember, it's not security check, it's rather to warn about outdated / incapatible ldc file usage.

SOF_ABI_VERSION_MINOR(SOF_ABI_DBG_VERSION),
SOF_ABI_VERSION_PATCH(SOF_ABI_DBG_VERSION));
/* compare firmware hash value from version file and ldc file */
if (ver.fw_hash != snd->version.fw_hash) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the verification be mandatory now, as it is more reliable (-e assumed when reading live trace and -v assumed when reading from a file)?

Copy link
Member Author

@ktrzcinx ktrzcinx Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good for me, maybe then some flag, e.g. -E should be added to force running without compatibility check. It sound like something for separate PR

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 31, 2020

@ktrzcinx posted an interesting example but github seems to have lost it. Here it is again:

- tr_info(dev, "Some value %d", a);
+ tr_info(dev, "Some value %d", b);

Here ldc hash will be the same for both examples, source code hash will differ.

  • If this is a "quick and dirty fix" in a local developer workspace or "between two friends" then in an ideal world and for convenience it should not require a new dictionary. This is exactly what concerns me here, thanks for the interesting example.
  • If this is real change in a commit in the shared main branch then the message (and of course the dictionary) must absolutely be changed too otherwise you'd have two different values logged the exact same way: incredibly confusing!
tr_info(dev, "Some _different_ value B %d", b);

No, it doesn't [break tracing in tarball build anymore].

Nice fallback on the git hash, smart.

(even one line add/delete breaks ldc compatibility because line number related with log message is saved in ldc file)

Very good point, I didn't know this.

there should be flag to force 'no check' for situations like this

Great, I think this would make a big difference.

Post build elf edition

You're really making this look much more complicated than it would be.

making SMEX run obligatory

SMEX is already the tool that creates the dictionary in the first place!?!

adding to them hashing library or double obligatory SMEX run (first to produce ldc file, then hash it in CMake, and second one SMEX to fill value in fw_ready section).

None of that. CMake can hash the dictionary with one-line. objcopy can insert the hash with another line, done?

@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood
Copy link
Member

CI appears to have got stuck - rerun. @marc-hb @ktrzcinx any more comments or updates pending ?

@slawblauciak
Copy link
Collaborator

SOFCI TEST

@lgirdwood
Copy link
Member

@slawblauciak CI seems to have stalled - I'll rerun it again.

@lgirdwood
Copy link
Member

SOFCI TEST

1 similar comment
@ktrzcinx
Copy link
Member Author

SOFCI TEST

@slawblauciak
Copy link
Collaborator

@lgirdwood looks like we passed all the required checks, good to merge now?

@lgirdwood
Copy link
Member

@slawblauciak I'm seeing SCHEDULER CI failure. Can you confirm with @zrombel we are good.

@slawblauciak
Copy link
Collaborator

@lgirdwood already have, it's not relevant

@lgirdwood lgirdwood merged commit 6c14e76 into thesofproject:master Aug 18, 2020
@marc-hb
Copy link
Collaborator

marc-hb commented Aug 27, 2020

A number of comments at #3195 (comment) had no answer.

This broke the build, fixed in PR #3322

Now running "make" prints this obscure warning every time:

CMake Warning at sof/scripts/cmake/version.cmake:87 (message):
  Source content hash can't be calculated, use GIT_LOG_HASH

Looking at the source code points a possible reason which... doesn't seem to apply to me.

)
string(SUBSTRING ${SOF_FW_HASH_LONG} 0 8 SOF_FW_HASH)
message(STATUS "Firmware content hash: ${SOF_FW_HASH}")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy to change later but you could move all the temp files to some new build_dir/source_hash/ subdirectory instead.

I don't think CMake builds should ever delete any temporary file because:

  1. Temporary files are useful to understand and debug the build. I very often browse or grep -r the build directory for that purpose. Then I can git grep the filenames and find the corresponding CMakeLists.txt source very quickly.
  2. With out-of-source builds there's no "pollution" concern anymore.

message(STATUS "Source content hash: ${SOF_SRC_HASH}")
else()
set(SOF_SRC_HASH ${GIT_LOG_HASH})
message(WARNING "Source content hash can't be calculated, use GIT_LOG_HASH")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message(WARNING "Source content hash not computed without git, _using_ GIT_LOG_HASH" instead)

"Use" is imperative mood.

@ktrzcinx
Copy link
Member Author

@marc-hb I applied your suggestions in #3356

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 3, 2021

I just filed #3890

While comparing various build outputs I realized "by chance" why using a source content hash for the logger ABI check (#3195) is flawed. It's quite obvious with hindsight: that's simply because the "source" that git knows about is not the source code that gets compiled. git misses at least: pre-processing, generated config.h, compiler flags and version and maybe others...

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 14, 2021

With (even more) hindsight, the most obvious failure of this design is CONFIG_TRACEV. See #3890

marc-hb added a commit to marc-hb/sof that referenced this pull request Jun 10, 2021
Fixes commit 6c14e76 ("trace: Log FW ABI and hash numbers").

In June 2020, PR thesofproject#3195 added a tr_info() "banner" immediately after to
make sure tracing works.

That commit included a likely misunderstood trace_flush() call
immediately after the tr_info(). The very poorly named trace_flush()
function sounds like earlier scheduling of something that would the same
later anyway but that's absolutely not what it does. Instead it copies
pending DMA traces to the shared mailbox.

This was most likely why the FW ABI banner is randomly duplicated in the
etrace from time to time, see for instance:
https://sof-ci.01.org/linuxpr/PR2954/build5823/devicetest/?model=BYT_MB_NOCODEC&testcase=test-speaker
https://sof-ci.01.org/softestpr/PR666/build721/devicetest/?model=TGLH_RVP_HDA&testcase=check-sof-logger

Note this test failure is unrelated. For now this CI does not read
etrace unless there's a failure.

In August 2020, commit 67a0a69 ("trace: Trace initial message as
error logs") upgraded this banner from tr_info() to tr_err(). That made
sure it gets to both the mailbox and the DMA trace but it didn't remove
the trace_flush(). Remove it now.

Signed-off-by: Marc Herbert <[email protected]>
marc-hb added a commit to marc-hb/sof that referenced this pull request Jun 10, 2021
Pure rename, zero functional change.

The very poorly named trace_flush() function sounds like earlier
scheduling of something that would the same later anyway but that's
absolutely not what it does. Instead it copies pending DMA traces to the
shared mailbox.

As an example, in June 2020, PR thesofproject#3195 commit 6c14e76 ("trace: Log
FW ABI and hash numbers") added a tr_info() "banner" immediately after
to make sure tracing works. That included a likely misunderstood
trace_flush() call immediately after the tr_info().

Signed-off-by: Marc Herbert <[email protected]>
lgirdwood pushed a commit that referenced this pull request Jun 10, 2021
Fixes commit 6c14e76 ("trace: Log FW ABI and hash numbers").

In June 2020, PR #3195 added a tr_info() "banner" immediately after to
make sure tracing works.

That commit included a likely misunderstood trace_flush() call
immediately after the tr_info(). The very poorly named trace_flush()
function sounds like earlier scheduling of something that would the same
later anyway but that's absolutely not what it does. Instead it copies
pending DMA traces to the shared mailbox.

This was most likely why the FW ABI banner is randomly duplicated in the
etrace from time to time, see for instance:
https://sof-ci.01.org/linuxpr/PR2954/build5823/devicetest/?model=BYT_MB_NOCODEC&testcase=test-speaker
https://sof-ci.01.org/softestpr/PR666/build721/devicetest/?model=TGLH_RVP_HDA&testcase=check-sof-logger

Note this test failure is unrelated. For now this CI does not read
etrace unless there's a failure.

In August 2020, commit 67a0a69 ("trace: Trace initial message as
error logs") upgraded this banner from tr_info() to tr_err(). That made
sure it gets to both the mailbox and the DMA trace but it didn't remove
the trace_flush(). Remove it now.

Signed-off-by: Marc Herbert <[email protected]>
lgirdwood pushed a commit that referenced this pull request Jun 10, 2021
Pure rename, zero functional change.

The very poorly named trace_flush() function sounds like earlier
scheduling of something that would the same later anyway but that's
absolutely not what it does. Instead it copies pending DMA traces to the
shared mailbox.

As an example, in June 2020, PR #3195 commit 6c14e76 ("trace: Log
FW ABI and hash numbers") added a tr_info() "banner" immediately after
to make sure tracing works. That included a likely misunderstood
trace_flush() call immediately after the tr_info().

Signed-off-by: Marc Herbert <[email protected]>
@@ -335,6 +338,11 @@ int dma_trace_enable(struct dma_trace_data *d)
d->enabled = 1;
schedule_task(&d->dmat_work, DMA_TRACE_PERIOD, DMA_TRACE_PERIOD);

/* it should be the very first sent log for easily identification */
tr_info(&dt_tr, "FW ABI 0x%x DBG ABI 0x%x tag " SOF_GIT_TAG " src hash 0x%x (ldc hash " META_QUOTE(SOF_SRC_HASH) ")",
SOF_ABI_VERSION, SOF_ABI_DBG_VERSION, SOF_SRC_HASH);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktrzcinx do you remember why you put both META_QUOTE(SOF_SRC_HASH) and %x SOF_SRC_HASH here? Do they ever look different from each other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They differ in case of using incompatible .ldc file, it's because META_QUOTE(SOF_SRC_HASH) comes from used .ldc file and SOF_SRC_HASH is .ri buildin variable.

Copy link
Collaborator

@marc-hb marc-hb Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ktrzcinx !

Do they ever look different from each other?

Answering that part of my own question: they do but only when forcing sof-logger to use an incompatible .ldc file with the -n option. META_QUOTE(SOF_SRC_HASH) is in the format string which is in the .ldc file and not in the firmware binary

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 11, 2023

This git hash-object technique is also creating autocrlf issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI ABI change involved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants