Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different sketch MD5 if built from different locations #4520

Closed
marcobrianza opened this issue Mar 15, 2018 · 20 comments
Closed

different sketch MD5 if built from different locations #4520

marcobrianza opened this issue Mar 15, 2018 · 20 comments
Assignees
Milestone

Comments

@marcobrianza
Copy link

I am using ESP.getSketchMD5() to manage software releases.

I have noticed that if I use the same Arduino IDE, but in a different place in the filesystem the sketch MD5 is different.

I have used a portable version of the IDE to be sure that the same files are used.
For example loading a sketch that prints MD5 shows this behaviour.

Is this a known thing?

void setup() {
  Serial.begin(115200);
  Serial.println();
  Serial.println( ESP.getSketchMD5());
}
void loop() {
}

@marcobrianza
Copy link
Author

marcobrianza commented Mar 15, 2018

using the strings command on a compiled binary shows some occurrence of the source path of the core, it would be nice if this could be avoided.

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 15, 2018

If it's about __FILE__ gcc macro related, then I'd like to have to solution too.

@igrr
Copy link
Member

igrr commented Mar 15, 2018 via email

@Eszartek
Copy link

Eszartek commented Mar 15, 2018

The use of ESP.getSketchMD5() is interesting to me as well, so after running strings on my binaries I saw the long absolute paths being stored. Since I compile my projects on Linux using makeEspArduino.mk makefile instead of the Arduino IDE, it was trivial to edit my config.mk from:

ESP_ROOT=/home/user/Arduino/hardware/esp8266com/esp8266
LIBS=~/Arduino/hardware/esp8266com/esp8266/libraries/ ~/Arduino/libraries/

to be:

ESP_ROOT=../hardware/esp8266com/esp8266
LIBS=../hardware/esp8266com/esp8266/libraries/ ../libraries/

Then, I compiled my current project on two different machines where the working directory is different, and the resulting md5sums matched. That's cool!

However, the more interesting result for me was that my binary size was now smaller:
from:
Ram: 61220 bytes
Flash: 375172 bytes
to:
Ram: 61096 bytes
Flash: 374200 bytes

I'm always working to save a byte here and there, so this was a nice gift for very little effort.

Edit: And yes, running strings now shows all the paths stored are relative.

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 15, 2018

@igrr thanks for the interesting reading !
While it should be feasible to backport it to (xtensa-)gcc-4.8.2 (at which cost?), @Eszartek's solution with makeEspArduino.mk seems very good at little cost.

@earlephilhower
Copy link
Collaborator

Not to threadjack, but if we do rebuild gcc please add in the "#define JUMP_TABLES_IN_TEXT 1" change, too, to save extra heap at the same time.

Where is the actual patch referenced in the email thread? I can't seem to find the changes themselves.

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 17, 2018 via email

@earlephilhower
Copy link
Collaborator

@d-a-v The links themselves go to the mailing list archive, which talks about the patch. But where are the actual patch files they proposed? I'm not seeing any link to it.

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 17, 2018

@earlephilhower in the bottom of the link in my comment, reproduced here:
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html
this is the link to the 3rd patch containing the __FILE__ management gcc patch.

@earlephilhower
Copy link
Collaborator

Thanks, @d-a-v . I saw the links but for some reason didn't see the actual changes. I'll see if they can apply to the GCC toolchain I think we're using now: https://github.com/jcmvbkbc/gcc-xtensa .

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 23, 2018

There are several branches on this repo.
I guess there is a good reason, still I don't know at what cost we could have -flto.

@igrr
Copy link
Member

igrr commented Mar 23, 2018

@d-a-v -flto, unfortunately, is not only a matter of rebuilding the toolchain. For simple programs it works pretty well, but for larger programs where you assign certain function to specific sections (like, put some stuff into IRAM and some into flash), LTO has problems tracking the correct sections to use.

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 23, 2018

I can't quantify flto benefit. I guess it could be huge especially for us.
Do you mean that the section-mixing flto issue is always true ?
Or is there a way to selectively use this option at the cost of splitting source files by hand to isolate and group objects from the same section together ?

@earlephilhower
Copy link
Collaborator

FWIW when testing the JUMP_TABLES_IN_ROM setting I was able to build and replace all the gcc executables dir using branch call0-4.8.2. Everything "just worked."

Whether that is, or is not, the branch being distributed I can't really say.

However, if you wanted to do FLTO tests, @d-a-v , it seems like it would be a safe starting point.

@earlephilhower
Copy link
Collaborator

earlephilhower commented Mar 24, 2018

@d-a-v It looks like the GCC patch could be applied by hand, with probably only minor mods, to the GCC-4.x in the repo. The meat of the change is in libcpp/macro.c which seems to not have undergone much editing from gcc4 to gcc7 (case BT_FILE:). Edit the local const char *name somehow (I think environment variables are not a good choice as it's hidden and not something easily used under both Linux/Mac and Windows) to remove the /.../Arduino/hardware/esp8266/xxx/ bit and you're golden.

Whether this is worth the grief, considering you'd need to rebuild the toolchain(this is a libcpp.so file) and use it on all supported platforms, is a question above my pay grade.

--edit--
How cool is it to see this at the top of a file you're looking at editing?

   Adapted to ANSI C, Richard Stallman, Jan 1987

@igrr
Copy link
Member

igrr commented Mar 24, 2018

I will apply the patch when building the new GCC version. CI builds for all platforms are set up (including ARM), just need to prepare the patch set.

@tiagopicon
Copy link

tiagopicon commented Nov 26, 2018

Hi
I have a small team (3 developers) working with Visual Studio + Visual Micro to program ESP8266.
Sometimes they work in the same proejct... So to avoid some problems I would like to check the binary file built from the same version. With the same source files I'm getting differents MD5 values from the "same" firmware. So, how can I trust which one I should use to send to production? We also work with PIC here, but we don't have this behaviour using their compiler.
What is your advice to figure out if the different binary, in fact are the same???

@earlephilhower earlephilhower self-assigned this Jan 15, 2019
@earlephilhower
Copy link
Collaborator

Now that we have GDB support (or will once the pull is merged), I don't think the GCC patch listed above is a good idea as it replaces the file path everywhere (including debug info). I don't think you'd be able to run GDB with symbols w/o serious manual effort remapping the files back to proper paths.

Remapping __FILE__, though, would not affect debugging info while making the final .bin file match between builds in different locations. The .elf will still not match, but will be usable with GDB (and we're concerned w/sketchMD5() anyway, not the ELF itself).

In GCC/gcc/final.c there's a BUILT_IN_FILE accessor we can patch using the same ideas as the above.

@earlephilhower
Copy link
Collaborator

This patch actually works, unlike the prior one, to strip paths from __FILE__ wherever it may lie, while leaving the ELF debug info untouched. Just built a test sketch with __FILE__ macros, exited Arduino, renamed the global Arduino dir, rebuilt in the new dir, and the BINs have 0 diffs while GDB has the full paths to source files.

diff --git a/libcpp/macro.c b/libcpp/macro.c
index 6d46027e4ea..0390b4c0414 100644
--- a/libcpp/macro.c
+++ b/libcpp/macro.c
@@ -275,9 +275,16 @@ _cpp_builtin_macro_text (cpp_reader *pfile, cpp_hashnode *node)
        const char *name;
        uchar *buf;

-       if (node->value.builtin == BT_FILE)
+       if (node->value.builtin == BT_FILE) {
          name = linemap_get_expansion_filename (pfile->line_table,
                                                 pfile->line_table->highest_line);
+         const char *shortname = NULL;
+#if defined (_WIN32)
+         shortname = strrchr(name, '\\');
+#endif
+         if (!shortname) shortname = strrchr(name, '/');
+         if (shortname) name = shortname;
+       }
        else
          {
            name = _cpp_get_file_name (pfile->main_file);

@earlephilhower earlephilhower added this to the 2.6.0 milestone Jan 21, 2019
@earlephilhower
Copy link
Collaborator

Latest git head has this patch in it (2.5.0-4 toolchain). Built same sources, one in ~/sketch_jul21b, one in ~/Arduino/sketch_jul21b. Same bins

earle@server:~/Arduino$ sha256sum /tmp/arduino_build_909419/sketch_jul21b.ino.bin 
8f2770e97627615c1753678474a282dabf6e9eb0a8d9b6db33cf1353acdf2db2  /tmp/arduino_build_909419/sketch_jul21b.ino.bin
earle@server:~/Arduino$ sha256sum /tmp/arduino_build_531665/sketch_jul21b.ino.bin 
8f2770e97627615c1753678474a282dabf6e9eb0a8d9b6db33cf1353acdf2db2  /tmp/arduino_build_531665/sketch_jul21b.ino.bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants