Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/drivers/uart/uart_basic_api/drivers.uart.cdc_acm fails to build #38181

Closed
galak opened this issue Aug 31, 2021 · 10 comments · Fixed by #38834
Closed

tests/drivers/uart/uart_basic_api/drivers.uart.cdc_acm fails to build #38181

galak opened this issue Aug 31, 2021 · 10 comments · Fixed by #38834
Assignees
Labels
area: Devicetree bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug Release Blocker Use this label for justified release blockers
Milestone

Comments

@galak
Copy link
Collaborator

galak commented Aug 31, 2021

The following fails:

./scripts/twister -p waveshare_open103z -s tests/drivers/uart/uart_basic_api/drivers.uart.cdc_acm

Traceback (most recent call last):
  File "/home/galak/git/zephyr/scripts/gen_handles.py", line 371, in <module>
    main()
  File "/home/galak/git/zephyr/scripts/gen_handles.py", line 341, in main
    assert len(hdls) < len(hs.handles), "%s no DEVICE_HANDLE_ENDS inserted" % (dev.sym.name,)
AssertionError: __device_dts_ord_256 no DEVICE_HANDLE_ENDS inserted

Git bisect shows the issue is related to:

ec331c6

@galak galak added the bug The issue is a bug, or the PR is fixing a bug label Aug 31, 2021
@galak galak added the priority: high High impact/importance bug label Aug 31, 2021
@JordanYates
Copy link
Collaborator

Looks like the original handle generation had some issues that was hidden by the extra DEVICE_HANDLE_ENDS that my commit removed.

gen_handles.py:
Node: /soc/usb@40005c00/cdc_acm_uart0
Orig deps:
        /soc/usb@40005c00
gen_handles.py: Final deps:
        /soc/pin-controller@40010800/gpio@40012000
        /soc/rcc@40021000

The final dependencies can apparently be larger than the initial dependencies, which is why the assert triggers.
I will see what can be done to fix this tonight.

@JordanYates
Copy link
Collaborator

JordanYates commented Sep 1, 2021

So there is a fundamental conflict between how devicetree is outputting device dependencies and what gen_handles.py is generating. This is a problem because they both need to output the same thing for the .handles array to stay the same size across linker stages.

devicetree_unfixed.h contains the _REQUIRES_ORDS property for nodes, which contains direct dependencies only. This list is used in the generation of the first .handles array.

gen_handles.py on the other hand is walking the entire dependency tree to get ALL dependencies of a device, not only the direct ones. The more nested a node is, the larger this array is relative to _REQUIRES_ORDS list.

The obvious solution is to change the behaviour of gen_handles.py to only generate direct dependencies, but I'm not sure if the original design required the complete tree for some reason, and neither #29644 or #32127 seem to discuss it.

@mbolivar-nordic @galak thoughts?

@cfriedt
Copy link
Member

cfriedt commented Sep 7, 2021

@mbolivar-nordic, @jfischer-no - are you able to have a look at this?

@JordanYates
Copy link
Collaborator

JordanYates commented Sep 9, 2021

gen_handles.py on the other hand is walking the entire dependency tree to get ALL dependencies of a device, not only the direct ones. The more nested a node is, the larger this array is relative to _REQUIRES_ORDS list.

This was slightly inaccurate. gen_handles.py is not getting ALL dependencies, but it does iterate upwards through dependents until it finds a device that actually exists in the final binary. For the following tree:

           d3_0
        /    |    \
   d2_0    d2_1     d2_2
      \   /     \   /
       d1_0      d1_1
          \     /
          our_dev

If both d1_0 and d1_1 resolve to devices, only d1_0 and d1_1 will be in the output array. But if d1_0 is not present, the output array will be d2_0, d2_1, d1_1. Either way, the problem is still the same. It appears on this board because the final cdc-acm node only depends on the parent USB device, but that depends on a clock and pin controller.

Only generating direct dependencies is a bad idea, because for this board, the parent USB node does not result in a devicetree device, which would mean the cdc-acm node would have no dependencies, which is obviously incorrect.

Unfortunately at devicetree parse time, we don't know which nodes result in devices, so we can only output three things:

  1. The number of direct dependencies (2 in the above example)
  2. The total number of dependencies in the whole tree (5 in the above example)
  3. A worst case fanout (3 in the above example)

Using 1 for the initial array is a non-starter, as this is what causes the bug, and 2 is strictly worse than 3.
Therefore I think the only two resolutions to this problem are to:

  1. Use a devicetree property that computes option 3 and pad the initial array on all devices to be that long.
  2. Add a board specific fudge factor as a Kconfig that adds N indices to the initial dependency array for all devices.

1 is the technically nicer solution, but is worse from a ROM footprint perspective (the vast majority of boards work with the current setup). 2 feels like a giant hack, but ROM footprint won't change for most boards, as a fudge factor of 0 will work fine.

@mbolivar-nordic
Copy link
Contributor

I don't have a good short term solution. Computing the worst case fanout and padding all the devices like that just doesn't feel like a good idea for an LTS release.

I hate to say it, but I think our best bet might be to revert for 2.7 until we have a chance to look at #32129. @tejlmand, any thoughts on that?

@JordanYates
Copy link
Collaborator

Reverting doesn't solve the problem, the problem existed before the commit.
Unless you're talking about reverting the entire feature, which would be a worse solution than the fudge factor from my point of view.

@cfriedt
Copy link
Member

cfriedt commented Sep 11, 2021

If I'm understanding this correctly, the problem is that we need additional entries between __device_handles_start and __device_handles_end, but currently we want to keep it the same size after the first pass compile. Is that right?

I can't see generating dev_handles.c as being the problem, because clearly python can parse whatever data it needs to at build time and generate whatever it needs to.

Does the device_handles array need to be fixed in location and size? I.e. can we simply read the original device_handles section from zephyr_prebuilt.elf into memory, discard it, and then append / inject a newly generated and resized section with any modifications necessary?

At the source / object level, the start and end symbols are extern, so it should be fairly easy to move the goalposts, no?

@JordanYates
Copy link
Collaborator

Does the device_handles array need to be fixed in location and size?

Yes, as this feature was initially reverted in the v2.5 stabilization period because they weren't constant (See #31548). Hashing of kernel objects is used with CONFIG_USERSPACE, and if these arrays change size/location then that breaks. There is some talk around arbitrary linker stages that would resolve this limitation in #32129 (driven by exactly this feature), but it doesn't exist yet, so cannot be used.

@mbolivar-nordic
Copy link
Contributor

Trying to work out a 'real fix' for after 2.7 here: #38836

@nashif
Copy link
Member

nashif commented Oct 4, 2021

still in 2.7 branch

@nashif nashif added this to the v2.7.0 milestone Oct 4, 2021
@nashif nashif added the Release Blocker Use this label for justified release blockers label Oct 4, 2021
@nashif nashif closed this as completed Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Devicetree bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug Release Blocker Use this label for justified release blockers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants