Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: native_posix fails to compile on macOS #37123

Conversation

cfriedt
Copy link
Member

@cfriedt cfriedt commented Jul 22, 2021

This was a bit of a hack, but it really highlights some of the things that we take for granted with Linux and otherwise with ELF-based targets.

Specifically, on macOS:

  • the Mach-O binary file format lacks:
    • weak symbol support
    • support for section names > 16 characters
  • The assembler (llvm) does not support the .type directive
  • The --whole-archive option is not supported by the macOS linker
  • va_list size is inconsistent for cbprintf:
    • on macOS (64-bit) sizeof(va_list): 8 sizeof(struct __va_list): 32

The Mach-O binary format limitations definitely present the greatest challenges currently to targetting macOS for native_posix_64 in the long run.

Weak Symbol Support

The __attribute__((weak_import)) is supposedly supported on macOS, but at the time this commit was made, it did absolutely nothing. Perhaps there is some other special applesauce that was missing 🤷.

Often, weak symbols are used in place of explicit infrastructure for overriding function implementations or symbol values. Weak symbols are actually not a standard feature of C and are mainly an ELF convenience. As a result, Zephyr will not be fully portable to different toolchains until weak symbols are removed entirely.

Section Name Limitations

In addition to a section name, macOS also relies on a segment name. The combination of the two looks like this:

__attribute__((section("segment_name,section_name")))

Both the segment_name and section_name are limited to 1 <= n <= 16 ascii characters in length. ELF solves that problem by placing section names into the string table. Perhaps that could be a long- term solution if the proper patches could be supplied to LLVM and Apple, but the adoption of that is unlikely. It's quite likely that the same proposal has already been made a number of times.

Currently, we rely a great deal on Z_ITERABLE_SECTION() and Z_STRUCT_SECTION_FOREACH(). It does not seem like a realistic expectation to rely on the C preprocessor and linker to perform all of the heavy lifting necessary for processing sections on macOS.

In addition to section names that are defined dynamically via macro-pasting (set at compile time), we also have statically defined sections that are just defined in source code without any macro-pasting (set prior to compile time). There is some work currently being done on CMake generated linker scripts #36140 where the section naming is brought out of C and placed in the build system, which would help for the statically defined sections.

For dynamic section names that are set at compile-time, currently there are 2 possible solutions.

Possible Solution No. 1

It may be possible to solve this problem by creating a new macro that behaves slightly differently for ELF vs Mach-O targets; on ELF, it could create the section name as usual, but with Mach-O, it could use the __attribute__((constructor)) to copy the relevant details to an unordered map for later processing. A slight modification of this approach could be used to sort each item into an ordered map.

If we then transition Z_STRUCT_SECTION_FOREACH() to a runtime function call with a callback, then the ELF implementation could operate more or less as usual, but the Mach-O implementation could be adjusted to simply process the previously updated data structures.

void __z_struct_section_foreach(const char *section, void (*cb)(void *));

There is a proof of concept working for that in this PR.

Possible Solution No. 2

Another possible solution for dynamic sections is to add symbol metadata in the form of strings and then perform some post-processing of those symbols and sections, similarly to what we do now with the various ELF hack scripts.

For this approach, my thoughts are:

  1. Place all such symbols into __attribute__((section("__DATA,z_macho_tmp")))
  2. Encode "symbol_name,intended_elf_section" tuple as a const string in __attribute__((section("__DATA,z_macho_tuple")))
    a. possibly some other construct that guarantees uniqueness
  3. In postprocessing, use the hash of the intended ELF section as a new section name
    a. so e.g. __z_foo_init_PRIORITY_BOOT_1_foo => a57c313908dbe or something
  4. Copy the symbol to the new section
  5. Repeat for each tuple
  6. Discard z_macho_tmp section
  7. Discard z_macho_tuple section

Useful Links

Details about the Mach-O binary format and linker-generated sections can
be found at the links below:
https://opensource.apple.com/source/xnu/xnu-4903.221.2/EXTERNAL_HEADERS/mach-o/loader.h.auto.html
https://github.com/aidansteele/osx-abi-macho-file-format-reference
https://stackoverflow.com/questions/17669593/how-to-get-a-pointer-to-a-binary-section-in-mac-os-x

Fixes #10945

@github-actions github-actions bot added the area: API Changes to public APIs label Jul 22, 2021
@cfriedt cfriedt changed the title toolchain: do not declare .type when compiling with llvm [WIP]: native_posix fails to compile on macOS Jul 22, 2021
@cfriedt
Copy link
Member Author

cfriedt commented Jul 22, 2021

Currently failing with

[8/92] Generating include/generated/offsets.h
FAILED: zephyr/include/generated/offsets.h 
cd /Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr && /opt/homebrew/opt/[email protected]/bin/python3.9 /Users/cfriedt/workspace/zephyrproject/zephyr/scripts/gen_offset_header.py -i /Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr/CMakeFiles/offsets.dir/./arch/posix/core/offsets/offsets.c.obj -o /Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr/include/generated/offsets.h
Traceback (most recent call last):
  File "/Users/cfriedt/workspace/zephyrproject/zephyr/scripts/gen_offset_header.py", line 83, in <module>
    ret = gen_offset_header(args.input, input_file, output_file)
  File "/Users/cfriedt/workspace/zephyrproject/zephyr/scripts/gen_offset_header.py", line 41, in gen_offset_header
    obj = ELFFile(input_file)
  File "/opt/homebrew/lib/python3.9/site-packages/elftools/elf/elffile.py", line 73, in __init__
    self._identify_file()
  File "/opt/homebrew/lib/python3.9/site-packages/elftools/elf/elffile.py", line 482, in _identify_file
    elf_assert(magic == b'\x7fELF', 'Magic number does not match')
  File "/opt/homebrew/lib/python3.9/site-packages/elftools/common/utils.py", line 77, in elf_assert
    _assert_with_exception(cond, msg, ELFError)
  File "/opt/homebrew/lib/python3.9/site-packages/elftools/common/utils.py", line 114, in _assert_with_exception
    raise exception_type(msg)
elftools.common.exceptions.ELFError: Magic number does not match
ninja: build stopped: subcommand failed.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 22, 2021

Need to look into a Mach-O tool instead of pyelftools for macos inside of scripts/gen_offset_header.py

pypi has macholib, although the source code seems to be incorrectly linked
https://pypi.org/project/macholib/
https://github.com/ronaldoussoren/macholib
https://macholib.readthedocs.io/en/latest/

include/toolchain/gcc.h Outdated Show resolved Hide resolved
@stephanosio stephanosio requested a review from aescolar July 22, 2021 14:33
@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch 2 times, most recently from 70c82da to 114a77d Compare July 24, 2021 01:35
@cfriedt
Copy link
Member Author

cfriedt commented Jul 24, 2021

macholib was actually kind of old and unmaintained so I just ran nm -g <file> and did some screen-scraping.

Now it just fails with errors about mach-o sections / segments.

More info:

https://stackoverflow.com/questions/17669593/how-to-get-a-pointer-to-a-binary-section-in-mac-os-x

@cfriedt
Copy link
Member Author

cfriedt commented Jul 24, 2021

After prefixing __TEXT, or __DATA, in a few places that define sections, the predominant error seen is

error: argument to 'section' attribute is not valid for this target: mach-o section specifier requires a section whose length is between 1 and 16 characters

Just to recap, Mach-O / Darwin sections look something like __TEXT,__foo_bar_baz. The part that is __foo_bar_baz may contain [a-zA-Z0-9._], but it must be 1 <= characters in length <= 16.

https://opensource.apple.com/source/xnu/xnu-4903.221.2/EXTERNAL_HEADERS/mach-o/loader.h.auto.html

So nasty... why not just use an index into a string table, like ELF?

@github-actions github-actions bot added the area: native port Host native arch port (native_sim) label Jul 24, 2021
@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch from 4dcf6ec to b764161 Compare July 24, 2021 23:42
@cfriedt
Copy link
Member Author

cfriedt commented Jul 24, 2021

Some static functions might be getting discarded, but it's almost linking now.

Getting a few warnings like this:

warning: '__weak' only applies to pointer types; type here is 'void' [-Wignored-attributes]

And then this error:

[86/91] Linking C executable zephyr/zephyr_prebuilt.elf
FAILED: zephyr/zephyr_prebuilt.elf zephyr/zephyr_prebuilt.map 
: && ccache /usr/bin/gcc   zephyr/CMakeFiles/zephyr_prebuilt.dir/misc/empty_file.c.obj -o zephyr/zephyr_prebuilt.elf  -Wl,-T  zephyr/linker_zephyr_prebuilt.cmd  -Wl,-Map=/Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr/zephyr_prebuilt.map  -Wl,--whole-archive  app/libapp.a  zephyr/libzephyr.a  zephyr/arch/arch/posix/core/libarch__posix__core.a  zephyr/soc/posix/inf_clock/libsoc__posix__inf_clock.a  zephyr/boards/posix/native_posix/libboards__posix__native_posix.a  -Wl,--no-whole-archive  zephyr/kernel/libkernel.a  zephyr/CMakeFiles/offsets.dir/./arch/posix/core/offsets/offsets.c.obj  -L/Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr  -Wl,-u,_OffsetAbsSyms  -Wl,-u,_ConfigAbsSyms  -m64  -ldl  -pthread  -lm && cd /Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr && /opt/homebrew/Cellar/cmake/3.19.6/bin/cmake -E echo
ld: unknown option: -T
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
FATAL ERROR: command exited with status 1: /opt/homebrew/bin/cmake --build /Users/cfriedt/workspace/zephyrproject/zephyr/build

@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch 4 times, most recently from 3b53621 to d1d0036 Compare July 25, 2021 00:37
@cfriedt
Copy link
Member Author

cfriedt commented Jul 25, 2021

Just about linking...

ccache /usr/bin/gcc   zephyr/CMakeFiles/zephyr_prebuilt.dir/misc/empty_file.c.obj -o zephyr/zephyr_prebuilt.elf -all_load app/libapp.a  zephyr/libzephyr.a  zephyr/arch/arch/posix/core/libarch__posix__core.a  zephyr/soc/posix/inf_clock/libsoc__posix__inf_clock.a  zephyr/boards/posix/native_posix/libboards__posix__native_posix.a  zephyr/kernel/libkernel.a  zephyr/CMakeFiles/offsets.dir/./arch/posix/core/offsets/offsets.c.obj  -L/Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr  -Wl,-u,__OffsetAbsSyms  -Wl,-u,__ConfigAbsSyms  -m64  -ldl  -pthread  -lm && cd /Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr && /opt/homebrew/Cellar/cmake/3.19.6/bin/cmake -E echo
Undefined symbols for architecture arm64:
  "___bss_end", referenced from:
      _z_bss_zero in libkernel.a(init.c.obj)
  "___bss_start", referenced from:
      _z_bss_zero in libkernel.a(init.c.obj)
  "___device_end", referenced from:
      _z_device_state_init in libkernel.a(device.c.obj)
      _z_impl_device_get_binding in libkernel.a(device.c.obj)
      _z_device_get_all_static in libkernel.a(device.c.obj)
      _device_required_foreach in libkernel.a(device.c.obj)
  "___device_start", referenced from:
      _z_device_state_init in libkernel.a(device.c.obj)
      _z_impl_device_get_binding in libkernel.a(device.c.obj)
      _z_device_get_all_static in libkernel.a(device.c.obj)
      _device_required_foreach in libkernel.a(device.c.obj)
  "___init_APPLICATION_start", referenced from:
      _z_sys_init_run_level.levels in libkernel.a(device.c.obj)
  "___init_POST_KERNEL_start", referenced from:
      _z_sys_init_run_level.levels in libkernel.a(device.c.obj)
  "___init_PRE_KERNEL_1_start", referenced from:
      _z_sys_init_run_level.levels in libkernel.a(device.c.obj)
  "___init_PRE_KERNEL_2_start", referenced from:
      _z_sys_init_run_level.levels in libkernel.a(device.c.obj)
  "___init_end", referenced from:
      _z_sys_init_run_level.levels in libkernel.a(device.c.obj)
  "___native_FIRST_SLEEP_tasks_start", referenced from:
      _run_native_tasks.native_pre_tasks in libsoc__posix__inf_clock.a(soc.c.obj)
  "___native_ON_EXIT_tasks_start", referenced from:
      _posix_soc_clean_up in libsoc__posix__inf_clock.a(soc.c.obj)
      _run_native_tasks.native_pre_tasks in libsoc__posix__inf_clock.a(soc.c.obj)
  "___native_PRE_BOOT_1_tasks_start", referenced from:
      _run_native_tasks.native_pre_tasks in libsoc__posix__inf_clock.a(soc.c.obj)
  "___native_PRE_BOOT_2_tasks_start", referenced from:
      _run_native_tasks.native_pre_tasks in libsoc__posix__inf_clock.a(soc.c.obj)
  "___native_PRE_BOOT_3_tasks_start", referenced from:
      _run_native_tasks.native_pre_tasks in libsoc__posix__inf_clock.a(soc.c.obj)
  "___native_tasks_end", referenced from:
      _posix_soc_clean_up in libsoc__posix__inf_clock.a(soc.c.obj)
      _run_native_tasks.native_pre_tasks in libsoc__posix__inf_clock.a(soc.c.obj)
  "___static_thread_data_list_end", referenced from:
      _z_init_static_threads in libkernel.a(thread.c.obj)
  "___static_thread_data_list_start", referenced from:
      _z_init_static_threads in libkernel.a(thread.c.obj)
  "__k_heap_list_end", referenced from:
      _statics_init in libkernel.a(kheap.c.obj)
  "__k_heap_list_start", referenced from:
      _statics_init in libkernel.a(kheap.c.obj)
  "__k_mem_slab_list_end", referenced from:
      _init_mem_slab_module in libkernel.a(mem_slab.c.obj)
  "__k_mem_slab_list_start", referenced from:
      _init_mem_slab_module in libkernel.a(mem_slab.c.obj)
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch 3 times, most recently from e1315f2 to f41cea9 Compare July 25, 2021 01:43
@cfriedt
Copy link
Member Author

cfriedt commented Jul 25, 2021

Added a few fake section names as symbols. Will need to figure out what to do there soon.

Looks like the issue with "__weak" is that the weak versions of symbols are not getting discarded.

% ccache /usr/bin/gcc   zephyr/CMakeFiles/zephyr_prebuilt.dir/misc/empty_file.c.obj -o zephyr/zephyr_prebuilt.elf -all_load app/libapp.a  zephyr/libzephyr.a  zephyr/arch/arch/posix/core/libarch__posix__core.a  zephyr/soc/posix/inf_clock/libsoc__posix__inf_clock.a  zephyr/boards/posix/native_posix/libboards__posix__native_posix.a  zephyr/kernel/libkernel.a  zephyr/CMakeFiles/offsets.dir/./arch/posix/core/offsets/offsets.c.obj  -L/Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr  -Wl,-u,__OffsetAbsSyms  -Wl,-u,__ConfigAbsSyms  -m64  -ldl  -pthread  -lm && cd /Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr && /opt/homebrew/Cellar/cmake/3.19.6/bin/cmake -E echo
duplicate symbol '_sys_clock_set_timeout' in:
    zephyr/libzephyr.a(sys_clock_init.c.obj)
    zephyr/libzephyr.a(native_posix_timer.c.obj)
duplicate symbol '_sys_clock_driver_init' in:
    zephyr/libzephyr.a(sys_clock_init.c.obj)
    zephyr/libzephyr.a(native_posix_timer.c.obj)
duplicate symbol '_arch_system_halt' in:
    zephyr/arch/arch/posix/core/libarch__posix__core.a(fatal.c.obj)
    zephyr/kernel/libkernel.a(fatal.c.obj)
duplicate symbol '_zephyr_app_main' in:
    app/libapp.a(main.c.obj)
    zephyr/kernel/libkernel.a(init.c.obj)
ld: 4 duplicate symbols for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

@cfriedt
Copy link
Member Author

cfriedt commented Jul 25, 2021

Getting a bit further now - just flaking out with scripts/gen_handles.py.

[1/93] Preparing syscall dependency handling

[77/93] Building C object zephyr/kernel/CMakeFiles/kernel.dir/sched.c.obj
/Users/cfriedt/workspace/zephyrproject/zephyr/kernel/sched.c:220:20: warning: unused function 'is_aborting' [-Wunused-function]
static inline bool is_aborting(struct k_thread *thread)
                   ^
/Users/cfriedt/workspace/zephyrproject/zephyr/kernel/sched.c:861:20: warning: unused function 'set_current' [-Wunused-function]
static inline void set_current(struct k_thread *new_thread)
                   ^
2 warnings generated.
[87/93] Linking C executable zephyr/zephyr_prebuilt.elf
ld: warning: ignoring file zephyr/linker_zephyr_prebuilt.cmd, building for macOS-arm64 but attempting to link with file built for unknown-unsupported file format ( 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A 0x0A )

[89/93] Generating dev_handles.c
FAILED: zephyr/dev_handles.c 
cd /Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr && /opt/homebrew/opt/[email protected]/bin/python3.9 /Users/cfriedt/workspace/zephyrproject/zephyr/scripts/gen_handles.py --output-source dev_handles.c --kernel /Users/cfriedt/workspace/zephyrproject/zephyr/build/zephyr/zephyr_prebuilt.elf --zephyr-base /Users/cfriedt/workspace/zephyrproject/zephyr
Traceback (most recent call last):
  File "/Users/cfriedt/workspace/zephyrproject/zephyr/scripts/gen_handles.py", line 331, in <module>
    main()
  File "/Users/cfriedt/workspace/zephyrproject/zephyr/scripts/gen_handles.py", line 166, in main
    elf = ELFFile(open(args.kernel, "rb"))
  File "/opt/homebrew/lib/python3.9/site-packages/elftools/elf/elffile.py", line 73, in __init__
    self._identify_file()
  File "/opt/homebrew/lib/python3.9/site-packages/elftools/elf/elffile.py", line 482, in _identify_file
    elf_assert(magic == b'\x7fELF', 'Magic number does not match')
  File "/opt/homebrew/lib/python3.9/site-packages/elftools/common/utils.py", line 77, in elf_assert
    _assert_with_exception(cond, msg, ELFError)
  File "/opt/homebrew/lib/python3.9/site-packages/elftools/common/utils.py", line 114, in _assert_with_exception
    raise exception_type(msg)
elftools.common.exceptions.ELFError: Magic number does not match
ninja: build stopped: subcommand failed.
FATAL ERROR: command exited with status 1: /opt/homebrew/bin/cmake --build /Users/cfriedt/workspace/zephyrproject/zephyr/build --target run

@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch from 8a022fe to 9e738af Compare July 25, 2021 22:37
@cfriedt cfriedt added RFC Request For Comments: want input from the community DNM This PR should not be merged (Do Not Merge) labels Jul 26, 2021
@cfriedt cfriedt requested a review from tejlmand July 26, 2021 16:35
@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch 6 times, most recently from 5fb5e33 to b96f6f7 Compare August 2, 2021 14:24
@galak galak removed the dev-review To be discussed in dev-review meeting label Aug 5, 2021
Copy link
Collaborator

@tejlmand tejlmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

As a draft it looks good, but a couple of comments regarding the build system that we must work on before taking it to a mergeable state.

Regarding the 1 < section_name <= 16 issue, I do believe #36140 will provide us possibilities of handle that limitation, but I believe that should be investigated after #36140 has completed.

@@ -1461,6 +1464,8 @@ if(CONFIG_OUTPUT_DISASSEMBLE_ALL)
)
endif()

# probably some equivalent command exists to read stats from a Mach-O binary
if (NOT (${CMAKE_HOST_SYSTEM_NAME} STREQUAL "Darwin" AND CONFIG_ARCH_POSIX))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid testing for Darwin and ARCH_POSIX everywhere.

We already have an infrastructure in place regarding compiler and linker flags, as well as bintools commands / flags / scripts.
See more here:
https://github.com/zephyrproject-rtos/zephyr/tree/main/cmake/toolchain
https://github.com/zephyrproject-rtos/zephyr/tree/main/cmake/compiler
https://github.com/zephyrproject-rtos/zephyr/tree/main/cmake/bintools
https://github.com/zephyrproject-rtos/zephyr/tree/main/cmake/linker

Of course that is not directly related to host, mostly the toolchain / compiler / linker, but we do have host-gnu which basically means Linux (I haven't tested if host works on Windows, but I don't think it does).
So it could hint that host-gnu should support a subidentifier with system name, Linux, Darwin, Windows.

I believe we should try to see how host specific knowledge can fit into that design.
(Of course this code is fine for exploration work)

Copy link
Member Author

@cfriedt cfriedt Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tejlmand - Yea, this PR is very much a hack, so not very pretty.

On macos, the native toolchain is not gnu based, so it might not fall into that category precisely. There is a gcc, but it is just a copy of clang (not even a symlink) and then it uses Apple's ld.

md5sum /usr/bin/clang /usr/bin/gcc
ad07b41d86b6cff03436930eed7d3b8a  /usr/bin/clang
ad07b41d86b6cff03436930eed7d3b8a  /usr/bin/gcc

Homebrew does provide a gcc and binutils for native development. I'll see how far that gets us..

Comment on lines +82 to +113
if (CONFIG_ARCH_POSIX AND ${CMAKE_HOST_SYSTEM_NAME} STREQUAL "Darwin")
set(symbol_prefix "_")
else()
set(symbol_prefix "")
endif()

foreach(symbol ${ARGN})
zephyr_link_libraries(${LINKERFLAGPREFIX},-u,${symbol})
zephyr_link_libraries(${LINKERFLAGPREFIX},-u,${symbol_prefix}${symbol})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be described through a linker property, for example:

set_linker_property(TARGET linker APPEND PROPERTY symbol_prefix _)

inside:
https://github.com/zephyrproject-rtos/zephyr/blob/main/cmake/linker/ld/host-gcc/linker_flags.cmake
(again, we need to extend current design to take host system into consideration)

Then this code would simply become:

Suggested change
if (CONFIG_ARCH_POSIX AND ${CMAKE_HOST_SYSTEM_NAME} STREQUAL "Darwin")
set(symbol_prefix "_")
else()
set(symbol_prefix "")
endif()
foreach(symbol ${ARGN})
zephyr_link_libraries(${LINKERFLAGPREFIX},-u,${symbol})
zephyr_link_libraries(${LINKERFLAGPREFIX},-u,${symbol_prefix}${symbol})
zephyr_link_libraries(${LINKERFLAGPREFIX},-u,$<TARGET_PROPERTY:linker,symbol_prefix>${symbol})

Note: #24851 focused on compiler, we still need to cleanup of linker functions like toolchain_ld_force_undefined_symbols in similar way.

Copy link
Member Author

@cfriedt cfriedt Aug 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tejlmand - inside cmake/linker/ld/host-gcc/linker_flags.cmake, if I add

set_linker_property(TARGET linker APPEND PROPERTY symbol_prefix _)

it results in a CMake error

Unknown CMake command "set_linker_property".

Do you have another PR in progress that allows us to call set_linker_property()?

@@ -109,18 +115,30 @@ function(toolchain_ld_link_elf)
set(use_linker "-fuse-ld=bfd")
endif()

if (CONFIG_ARCH_POSIX AND ${CMAKE_HOST_SYSTEM_NAME} STREQUAL "Darwin")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment regarding linker flags.

@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch 2 times, most recently from ea84138 to 8b8d578 Compare August 14, 2021 14:07
@cfriedt
Copy link
Member Author

cfriedt commented Aug 14, 2021

I installed homebrew's gcc and binutils with

brew install gcc binutils
export PATH=/opt/homebrew/Cellar/gcc/11.2.0/bin:$PATH
export PATH=/opt/homebrew/opt/binutils/bin:$PATH

a west build now shows this:

...
-- The C compiler identification is GNU 11.1.0
-- The CXX compiler identification is GNU 11.1.0
-- The ASM compiler identification is GNU
-- Found assembler: /opt/homebrew/Cellar/gcc/11.2.0/bin/gcc
...

Incrementally apply hacks from this PR to see if there are any gains using a GNU toolchain rather than the native Clang toolchain.

[7/92] Building C object zephyr/CMakeFiles/offsets.dir/arch/posix/core/offsets/offsets.c.obj
FAILED: zephyr/CMakeFiles/offsets.dir/arch/posix/core/offsets/offsets.c.obj
...
/var/folders/n_/mrhy6lxx2x14l1wk3b9rlw4r0000gn/T//ccbM7cV1.s:14:2: error: unknown directive
        .type   ___cpu_t_current_OFFSET,@object

The .type directive issue also exists with gcc - I guess it has more to do with the Mach-O as well? That condition should probably be changed to APPLE && MACH .

The ssize_t and off_t issues exist for both llvm and gcc cases, so that fixup should probably go into gcc.h .

There did not seem to be any complaints about the weak attribute, which I was a bit suspicious of, but the section error presented itself slightly differently as

[53/92] Building C object zephyr/boards/posix/native_posix/CMakeFiles/boards__posix__native_posix.dir/timer_model.c.obj
FAILED: zephyr/boards/posix/native_posix/CMakeFiles/boards__posix__native_posix.dir/timer_model.c.obj 
...
/var/folders/n_/mrhy6lxx2x14l1wk3b9rlw4r0000gn/T//cck8A6SR.s:1165:11: error: mach-o section specifier requires a section whose length is between 1 and 16 characters
        .section __DATA,.native_PRE_BOOT_11_task
                 ^

Without the "__DATA," prefix (or any segment identifier), the assembler would complain that there was an unexpected token.

Actually, there was no warning at all about weak symbols with gcc and it appears that it just linked whatever came first, which is arguably worse (no warning and unexpected behaviour).

The homebrew binutils does not provide any ld - it's still the Apple ld that is used even in that case, so there is no gain switching to a GNU toolchain to reduce the number of toolchain quirks.

I think I'll opt to use the native tools for now, as it is what is primarily supported by Apple and used by HomeBrew.

Will make changes requested by @tejlmand shortly and update this PR as I go along.

@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch from 8b8d578 to d684da3 Compare August 14, 2021 15:35
@cfriedt cfriedt requested review from galak and tejlmand August 14, 2021 15:38
@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch 3 times, most recently from 2069e54 to cd6e158 Compare August 19, 2021 00:32
@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch 2 times, most recently from 6a2eaa5 to 012fb1d Compare September 6, 2021 08:48
This was a bit of a hack, but it really highlights some of the things
that we take for granted with Linux and otherwise with ELF-based
targets.

Specifically, on macOS:

* the Mach-O binary file format:
  * lacks weak symbol support
  * lacks support for section names > 16 characters
  * prefixes all symbols with `_`
* The assembler (both llvm & GNU) does not support the `.type` directive
* Linker options are different with Apple's ld
  * unsupported: `--whole-archive` / `--no-whole-archive`
  * `-Map` becomes `-map`
  * linker script format is completely different
  * macOS uses `-Wl,-order_file` instead of `-Wl,-T` for linker script

Details about the Mach-O binary format and linker-generated sections can
be found at the links below:
https://opensource.apple.com/source/xnu/xnu-4903.221.2/EXTERNAL_HEADERS/mach-o/loader.h.auto.html
https://github.com/aidansteele/osx-abi-macho-file-format-reference
https://stackoverflow.com/questions/17669593/how-to-get-a-pointer-to-a-binary-section-in-mac-os-x

Fixes zephyrproject-rtos#10945

Signed-off-by: Christopher Friedt <[email protected]>
@cfriedt cfriedt force-pushed the issue/10945/native-posix-fails-to-compile-on-macos branch from 012fb1d to 8856971 Compare September 16, 2021 23:22
@cfriedt
Copy link
Member Author

cfriedt commented Sep 19, 2021

I wrote a machotools python package (took a bit of work, very loosely based on pyelftools) and was able to recover all
of the metadata necessary to reconstruct contiguous regions by composing one of the structs below. for each element in any iterable section.

struct z_macho_tuple {
  const char *elf_section_name;
  void *symbol_addr;
  size_t symbol_size;
};

In a contrived example, the output of ./macho_map.py -i foo -o foo.exe gives

z_macho_tuple: section_name: .native_PRE_BOOT_1_prio_0_task, symbol_addr: 100008020, symbol_size: 8
z_macho_tuple: section_name: .native_PRE_BOOT_1_prio_1_task, symbol_addr: 100008028, symbol_size: 8
z_macho_tuple: section_name: .native_PRE_BOOT_1_prio_2_task, symbol_addr: 100008030, symbol_size: 8
z_macho_tuple: section_name: .native_PRE_BOOT_3_prio_0_task, symbol_addr: 100008038, symbol_size: 8
z_macho_tuple: section_name: .native_PRE_BOOT_3_prio_1_task, symbol_addr: 100008040, symbol_size: 8
z_macho_tuple: section_name: .native_PRE_BOOT_3_prio_2_task, symbol_addr: 100008048, symbol_size: 8
z_macho_tuple: section_name: _static_thread_data, symbol_addr: 100008158, symbol_size: 88
z_macho_tuple: section_name: _static_thread_data, symbol_addr: 1000081b0, symbol_size: 88
z_macho_tuple: section_name: _static_thread_data, symbol_addr: 100008208, symbol_size: 88

Now, each of these symbols has been placed in a custom section called z_macho_map. However, in the first stage, they are both ungrouped and unordered (if they seem ordered above, it's purely coincidental to the contrived example).

I am taking a 2-stage linking approach, where the first stage output is linked with -Wl,-undefined,dynamic_lookup. This has the advantage that linking can succeed for the binary with undefined symbols. We do this for all of the start[] and end[] sections.

nm foo | grep "\(_start\|_end\)$"
                 U ___native_FIRST_SLEEP_tasks_start
                 U ___native_ON_EXIT_tasks_start
                 U ___native_PRE_BOOT_1_tasks_start
                 U ___native_PRE_BOOT_2_tasks_start
                 U ___native_PRE_BOOT_3_tasks_start
                 U ___native_tasks_end
                 U ___static_thread_data_list_end
                 U ___static_thread_data_list_start

Furthermore, the linker knows to look for these symbols in a dynamically linked library and fill them in inside of the literal pool!

otool -vVT foo
otool -vVt foo |  grep "\(_start\|_end\)$"
0000000100003c40        ldr     x9, [x9, #0x8] ; literal pool symbol address: ___static_thread_data_list_start
0000000100003c50        ldr     x9, [x9] ; literal pool symbol address: ___static_thread_data_list_end

The next stage involves examining the first binary for z_macho_tuple entries, and for each entry, copying the corresponding bytes to an array in a second output file macho_map.c which is compiled into a libzmachomap.dylib.

Next, the z_macho_map section is completely discarded! It's safe because we've copied the data out into a C source file (now compiled to a .dylib) AND because the start[] and end[] symbols for each iterable section were externalized to a separate library.

Finally, the second-stage link does not use the -Wl,-undefined,dynamic_lookup option and requires all symbols to be resolved, which is satisfied via -lmachomap.

So, it's definitely less invasive in that a special case does not need to be made for every different type of symbol section and should make things Just Work ™️ on macOS for native_posix.

The PoC repo for this stuff is here:
https://github.com/cfriedt/macho-map

I would imagine that, aside from using dynamic linking, this isn't too far off from what our regular build process is for Zephyr. I would imagine though, that adding in device handles could be kind of tricky, so it will likely take a bit of time before that is finally worked out. In the ideal case, the API would be somewhat consistent when compared to pyelftools, which would make for minimal differences in the Zephyr build process.

attn: @stephanosio , @tejlmand , @nashif, @mbolivar-nordic

@cfriedt cfriedt added the area: macOS Support Related to building Zephyr on macOS label Sep 29, 2021
@mbolivar-nordic
Copy link
Contributor

In the ideal case, the API would be somewhat consistent when compared to pyelftools, which would make for minimal differences in the Zephyr build process.

#38836 may be relevant then

@cfriedt
Copy link
Member Author

cfriedt commented Nov 22, 2021

Changing my strategy w.r.t. the 16-byte segment / section issue. Going to patch clang instead so Zephyr should not require any source-level modifications to section names.

Instead of using the entire section name, as-is, will calculate the SHA256 of the section name, then will convert to Base64 representation, and truncate at 16 characters. This approach has been used before by other runtimes (e.g. Rust) that need to fit larger section names into Mach-O binaries.

@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jan 22, 2022
@github-actions github-actions bot closed this Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Boards area: Build System area: Kernel area: macOS Support Related to building Zephyr on macOS area: native port Host native arch port (native_sim) area: Samples Samples area: Timer Timer DNM This PR should not be merged (Do Not Merge) RFC Request For Comments: want input from the community Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

native_posix fails to compile on macOS
5 participants