DTS device dependency is shifting memory addresses between builds #31546

andrewboie · 2021-01-22T22:23:13Z

Describe the bug
I was notified Friday morning that my demand paging PR had been reverted as it had been identified as the culprit in a failure in tests/kernel/fatal/exception. The crash occurred at very early boot, right after installing page tables.

However, digging deeper this is not the case and my PR just exposed a different issue with a PR that went in about the same time.

When I reproduced locally, I found that the memory addresses of page tables were shifting in between builds:

$ nm -n zephyr/zephyr.elf | grep ptables
0000000000120000 R z_x86_kernel_ptables

$ nm -n zephyr/zephyr_prebuilt.elf | grep ptables 
000000000011f000 D z_x86_kernel_ptables

This will totally break the system, we absolutely rely on memory addresses of symbols not shifting in between builds as certain auto-generated CPU structures (such as page tables) will not function correctly if the information used to build them in zephyr_prebuilt.elf becomes stale.

In this particular case, this resulted in the memory mapping for the kernel image to not be properly sized. One of the changes in the demand paging PR is that only the kernel image is now memory-mapped with other page frames being left un-mapped and available for anonymous memory mappings.

In addition to this problem, this can also break userspace as the kernel object tables are created from symbol addresses in zephyr_prebuilt.elf. If the addresses of kernel object symbol addresses changes between zephyr.elf and zephyr_prebuilt.elf no syscalls will work. This is why the kernel object tables themselves are located at the very end of RAM so that other addresses do not shift.

I then looked at the image to see where the addresses started shifting. The culprit is some symbols all prefixed with __devicehdl:

 0000000000108070 R __device_handles_start
-0000000000108070 V __devicehdl_DT_N_S_soc_S_uart_2f8
+0000000000108070 R __devicehdl_sys_init_z_clock_driver_init0
 0000000000108070 R __init_APPLICATION_start
 0000000000108070 R __init_end
 0000000000108070 R __init_POST_KERNEL_start
 0000000000108070 R __init_SMP_start
-0000000000108080 V __devicehdl_DT_N_S_soc_S_uart_3f8
-000000000010808a V __devicehdl_sys_init_z_clock_driver_init0
-0000000000108090 R __app_shmem_regions_end
-0000000000108090 R __app_shmem_regions_start
...snip...
+0000000000108078 R __devicehdl_DT_N_S_soc_S_uart_3f8
+0000000000108088 R __devicehdl_DT_N_S_soc_S_uart_2f8
+0000000000108092 R __app_shmem_regions_end
+0000000000108092 R __app_shmem_regions_start

The overall size of these handles is increasing:

 0000000000108070 R __device_handles_start
-0000000000108090 R __device_handles_end
+0000000000108092 R __device_handles_end

We are off by 2 bytes. As bad luck would have it, this particular test case resulted in the size of the kernel image being pushed out by an additional page due to alignment requirements.

To Reproduce
Back up in the tree to right before where my demand paging PR was reverted and run tests/kernel/fatal/exception using sentinel.conf.

Not sure why other tests for userspace aren't also failing, we might be getting lucky with alignement directives restoring the synchronization between symbol addresses after they get messed up.

Expected behavior
This new DTS infrastruture can't change in size between builds.

Impact
Showstopper.

The text was updated successfully, but these errors were encountered:

andrewboie added the bug The issue is a bug, or the PR is fixing a bug label Jan 22, 2021

andrewboie assigned pabigot Jan 22, 2021

andrewboie added the priority: high High impact/importance bug label Jan 22, 2021

pabigot mentioned this issue Jan 22, 2021

device: revert device dependency injection #31548

Merged

andrewboie mentioned this issue Jan 23, 2021

restore demand paging PR + PC fixes #31564

Merged

nashif closed this as completed in #31548 Jan 23, 2021

pabigot mentioned this issue Feb 9, 2021

devicetree-based device definitions and dependency representations reboot #32127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DTS device dependency is shifting memory addresses between builds #31546

DTS device dependency is shifting memory addresses between builds #31546

andrewboie commented Jan 22, 2021

DTS device dependency is shifting memory addresses between builds #31546

DTS device dependency is shifting memory addresses between builds #31546

Comments

andrewboie commented Jan 22, 2021