Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device: insert padding to ordinal array #38487

Closed
wants to merge 6 commits into from
Closed

device: insert padding to ordinal array #38487

wants to merge 6 commits into from

Conversation

JordanYates
Copy link
Collaborator

@JordanYates JordanYates commented Sep 13, 2021

Insert padding into the handles array in devices to account for the fact that dependencies can "fan-out" to more devices than are included in the direct _REQUIRES_ORDS property, if nodes in that array do not exist in the final binary.

While working on this I attempted to forward down supported devices as well, but this results in paddings of many hundreds of handles for high level nodes like interrupt and clock controllers on heavily interconnected platforms like frdm_k64f. The ROM increase for that can't be justified, and I don't think an algorithm exists to calculate the actual fan-out value, so supporting this information must wait for multiple link stages.

Resolves #38181

@JordanYates JordanYates added this to the v2.7.0 milestone Sep 13, 2021
@github-actions github-actions bot added area: API Changes to public APIs area: Devicetree area: Devicetree Tooling PR modifies or adds a Device Tree tooling labels Sep 13, 2021
@zephyrbot zephyrbot requested a review from dcpleung September 13, 2021 11:58
@cfriedt cfriedt added bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug labels Sep 14, 2021
Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_estimate_fanout() looks awfully familiar ;-)
nice work

@cfriedt
Copy link
Member

cfriedt commented Sep 14, 2021

works on my desk:

./scripts/twister -p waveshare_open103z -s tests/drivers/uart/uart_basic_api/drivers.uart.cdc_acm
INFO    - Zephyr version: v2.7.0-rc1-88-g80e587345c56
INFO    - JOBS: 8
INFO    - Using 'zephyr' toolchain.
INFO    - Building initial testcase list...
INFO    - 1 test scenarios (1 configurations) selected, 0 configurations discarded due to filters.
INFO    - Adding tasks to the queue...
INFO    - Added initial list of jobs to queue
INFO    - Total complete:    1/   1  100%  skipped:    0, failed:    0
INFO    - 1 of 1 test configurations passed (100.00%), 0 failed, 0 skipped with 0 warnings in 38.98 seconds
INFO    - In total 7 test cases were executed, 0 skipped on 1 out of total 394 platforms (0.25%)
INFO    - 0 test configurations executed on platforms, 1 test configurations were only built.
INFO    - Saving reports...
INFO    - Writing xunit report /Users/cfriedt/workspace/zephyrproject/zephyr/twister-out/twister.xml...
INFO    - Writing xunit report /Users/cfriedt/workspace/zephyrproject/zephyr/twister-out/twister_report.xml...
INFO    - Run completed

@JordanYates
Copy link
Collaborator Author

Given the reservations he expressed about this approach, please don't merge this until @mbolivar-nordic has had a chance to review.

@de-nordic
Copy link
Collaborator

Given the reservations he expressed about this approach, please don't merge this until @mbolivar-nordic has had a chance to review.

@JordanYates You may add "DNM" label to stop this from being merged.

@JordanYates JordanYates added the DNM This PR should not be merged (Do Not Merge) label Sep 14, 2021
@@ -132,6 +132,40 @@ def scc_order(self):
return self.__scc_order
__scc_order = None

@staticmethod
def _estimate_fanout(edge_map, root):
# Compute an estimate of the maximum fan-out of the edge_map from a given root.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a writeup of the algorithm a bit more abstractly along with more of a formal "algorithms 101" argument why it is computing an upper bound on the correct fanout? I'd like to see that recorded in the commit log for when we get back to this later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've had a go at improving the explanation.
I never took "Algorithms {N}", so let me know if it's missing some essential component of an algorithm description :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The observation that the number we want to compute is the number of leaf nodes if the graph is a tree makes sense. but I'm not totally sure about what you mean for observations 2. and 3. I'm still thinking this over as a result. (If you have any friends who are good at graph problems who might be able to provide a proof that it's intractable or a clever algorithm for finding the answer, please ask them also...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually its a transform from a DAG to a N-ary tree. 2 and 3 are trying to explain that the transform operation cannot reduce the actual maximum fanout. Once you get to an N-ary tree, the maximum fanout is easy to compute, but due to the transforms, it can be somewhat higher than the fanout for the original graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about this; thanks for your patience. I wish Peter were still here; I've had to spin up on the internals of all this again while distracted with other things. I am making this PR a priority for this week though.

Jordan Yates added 6 commits September 16, 2021 20:27
Estimate maximum fan-out of the tree by treating it as an N-ary tree
with special terminating conditions. This approach will overestimate the
value for heavily interdependent trees, but I suspect the actual maximum
fan-out is an NP hard problem.

The algorithm depends on three observations:
1. The maximum fan-out of an N-ary tree is the number of leaf node.
2. Merging two nodes in an N-ary tree (preserving their edges) can only
   decrease the maximum fan-out of the tree.
3. Therefore "cutting" edges by swapping an edge for a new leaf node can
   only increase the fan-out (assuming we don't isolate portions of the
   tree).

Therefore by cutting edges until we have an N-ary tree and then counting
the leaf-nodes, we arrive at an overestimation of the true fan-out.

Signed-off-by: Jordan Yates <[email protected]>
Adds properties to query the estimated fan-out of a node in both
the supported and required directions.

Signed-off-by: Jordan Yates <[email protected]>
Add tests for the new fan-out properties.

Signed-off-by: Jordan Yates <[email protected]>
Generate a property which describes the padding that must be applied
to the `REQUIRES_ORDS` array in order to ensure that the
postprocessing performed in `gen_handles.py` does not change the size
of the array. This list is empty for most nodes on most SOC's.

The list is enumerated in the property as using `UTIL_LISTIFY` inside
`DEVICE_DEFINE` results in compilation failures.

Signed-off-by: Jordan Yates <[email protected]>
Add a macro to query the padding list that need to be inserted into
the ordinal array to ensure that sizes do not increase in the second
linking stage.

Signed-off-by: Jordan Yates <[email protected]>
Insert padding into the handles array to account for potential fan-out
from direct dependencies that don't exist in the final binary.

```
gpio clock
  \   /
   usb
    |
 cdc_acm
```
For example in the above devicetree snippet, creating the handles
array with only 1 dependent device will be insufficient if the usb
node is not present, as 2 ordinals will need to inserted.

This is a temporary solution until #32129 is resolved.

Fixes #38181.

Signed-off-by: Jordan Yates <[email protected]>
@cfriedt
Copy link
Member

cfriedt commented Sep 21, 2021

Here's a wacky idea, assuming we can add arbitrary properties to a DT node.

Re-using the diagram from here.

           d3_0
        /    |    \
   d2_0    d2_1     d2_2
      \   /     \   /
       d1_0      d1_1
          \     /
          our_dev

Why not just add an artifical property in our_dev to point directly at d2_0, d2_1, and d3_0? We should follow phandles too and not just nesting for determining device dependencies.

I think it's just failing 1 test right now, right? So the scope of the workaround is pretty limited.

Maybe we could provide the workaround and then create an Enhancement issue instead?

Turn the bug into a feature 😏

@JordanYates
Copy link
Collaborator Author

Why not just add an artifical property in our_dev to point directly at d2_0, d2_1, and d3_0? We should follow phandles too and not just nesting for determining device dependencies.

I don't understand what this is trying to achieve, and why it only has 3 handles in it?
What is this new property for?

I think it's just failing 1 test right now, right? So the scope of the workaround is pretty limited.

Only because that test is "unlucky", there is no telling what sorts of devicetree structures users have generated.

@cfriedt
Copy link
Member

cfriedt commented Sep 23, 2021

Why not just add an artifical property in our_dev to point directly at d2_0, d2_1, and d3_0? We should follow phandles too and not just nesting for determining device dependencies.

I don't understand what this is trying to achieve, and why it only has 3 handles in it?
What is this new property for?

What determines the size of the dependency array? Direct dependencies, no?

@JordanYates
Copy link
Collaborator Author

What determines the size of the dependency array? Direct dependencies, no?

No, that is the size that is currently allocated for it, but it is not always sufficient. That is what this PR is addressing.

The final size of the dependency array depends on which of the upstream devices are enabled. The dependency tree is iterated over for all paths, whenever a device that exists is found, it is inserted into the array and iteration on that branch stops.

@cfriedt
Copy link
Member

cfriedt commented Sep 23, 2021

What determines the size of the dependency array? Direct dependencies, no?

No, that is the size that is currently allocated for it, but it is not always sufficient. That is what this PR is addressing.

That is exactly what my proposal addresses. CREATE direct dependencies for the additional dependencies that would otherwise be hidden.

It's a workaround, and requires that we know the hidden dependencies in advance for any affected device.

Copy link
Contributor

@mbolivar-nordic mbolivar-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all your work and analysis here so far; it's been extremely helpful.

My main issues with this approach are:

  1. edtlib should not care about this. This is fundamentally a workaround for a limitation in Zephyr's build system combined with constraints in its userspace implementation. Putting the logic in edtlib is the wrong layering. Especially as we are now trying to use edtlib in https://github.com/devicetree-org/lopper and have existing work spinning out the edtlib codebase into its own project (https://github.com/zephyrproject-rtos/python-devicetree), it is only becoming more important to keep the layering clean.
  2. Similarly, we should not be extending devicetree.h with this. That's a stable API, so it does not make sense to be adding macros like DT_REQUIRES_DEP_ORDS_PADDING. That macro gives back values which are best-effort attempts to upper bound the solutions of a difficult graph problem that are not really related to the devicetree itself (see 1.), so it doesn't belong in devicetree.h. If we resolve the build system limitations from 1., we would have to go through the stable API change process to remove DT_REQUIRES_DEP_ORDS_PADDING, which clearly would not make sense.
  3. There is no formal argument that the computed padding is in fact a valid upper bound on the worst-case value, and the commit logs are pretty scarce on the details we need to check the analysis or improve it later.
  4. Related to 3., as far as I can tell, the problem also exists in the "support" direction as well; I think we just haven't run into it due to a bug in 0c6588f. Notice how the "forward the dependency up one level" step that's done for deps is not being done for sups here: 0c6588f#diff-521e5dcb0b3ae49bca68c34cdb0b300f3341ca9d52d185beab8a4cf8b2bbec96R287. As far as I can tell, that's wrong, because we still need to keep going deeper in the devicetree to find the nodes which do correspond to devices that our node "supports".

To deal with 1. and 2. I think we need to generate a map from node identifiers to padding values in gen_defines.py only, making no changes to python-devicetree. The results should go into a new generated header, named device_handle_padding_workaround.h or so in order to make it clear that this isn't something we should want to live with forever.

To deal with 3. I think we need a better definition of this problem and analysis of it. Like I said a few days ago, I think the tree observation is a good one, but it's not enough, and definitely the commit logs / source code comments don't capture the problem.

To deal with 4., the fix seems clear, but I'm curious if you (@JordanYates) can find an error in my above analysis.

I've been trying to stay out of gen_handles.py as much as possible, but as a result of this issue I have to say I think it's time to refactor it. The way it's written makes it difficult to read and reason about, which is really not good for such a critical script, especially since its original author is no longer working on Zephyr and it is therefore unmaintained. It took a long time for me to spin up on these details, and the script's readability issues did not help.

I'm working on an alternate proposal that should address these issues. I know time is tight for 2.7.0; this continues to be my top priority for the week.

@JordanYates
Copy link
Collaborator Author

  1. edtlib should not care about this. This is fundamentally a workaround for a limitation in Zephyr's build system combined with constraints in its userspace implementation.

The problem is the build system limitation, but the question of "what is the worst case fanout" is a generic graph question, even if it doesn't have much application beyond solving this issue. I do understand you don't want non-formal algorithms in its API however.

  1. Similarly, we should not be extending devicetree.h with this

The stable API considerations makes sense.

  1. There is no formal argument that the computed padding is in fact a valid upper bound on the worst-case value

That's fair, I haven't been able to find an equivalent problem in literature, and I don't have the maths background to formalize a proof.

  1. Related to 3., as far as I can tell, the problem also exists in the "support" direction as well;

I explicitly excluded it from the original PR as I could tell that the size of the output could easily exceed the size of the direct supported devices. Unfortunately I didn't turn that analysis back on the dependency implementation at the time. At one stage for this PR I did add that support back in, but my fan-out algorithm can massively overestimate fan-out for high level nodes like clock controllers (by 100's of handles on complex trees like kinetis).

RE the proposed solutions:

Moving the logic from edtlib to gen_defines.py will be less concise, as you're several layers away from the core data structures, but that's an implementation consideration.

While there is an obvious solution to 4 (forwarding the supports), without a better fan-out estimation (or multiple linker stages), IMO the overhead is too large. I would strongly argue against reverting it entirely however as iterating direct supports are indispensable for some problems (e.g. devices on a power switch).

I thing gen_handles.py is generally fine, sure some parts are a bit opaque, but not enough to necessitate a wholesale rewrite.

@mbolivar-nordic
Copy link
Contributor

mbolivar-nordic commented Sep 24, 2021

I discussed with @tejlmand this morning; he and I agree on the following:

  • The real problem is in the build system and a fix seems possible there, but would be too time-intensive / invasive for us to feel good proposing it for zephyr v2.7.0
  • Since the larger problem was exposed by 0c6588f but is not caused by it, we should introduce a 'fudge factor' workaround for v2.7.0 to allow users to pad the handles arrays in linker pass 1 if gen_handles.py can't fit everything necessary for pass 2
  • We should go for a "real" fix in the build system for v3.0, and can propose a backport to v2.7.x once it's been stabilized in the next release

Related to the final point above,

  1. [...] as far as I can tell, the problem also exists in the "support" direction as well;
    [...] I'm curious if you (@JordanYates) can find an error in my above analysis.

@JordanYates your response does not include a straightforward "yes" or "no", but you seem to agree that there is no error.

So my overall suggestions are:

  • Revert the 'supports' devices handles for zephyr v2.7.0; it isn't sufficiently thought through or stabilized for an LTS release
  • Add a fudge factor as described above to work around the existing bug which may still manifest in the 'depends' direction
  • Go for the real fix in 3.0 and try to backport it if it makes sense

I'm preparing a separate issue for the 'real fix' and PR for the workaround for v2.7.0.

@JordanYates
Copy link
Collaborator Author

I can agree with the first set of points. I disagree with:

Revert the 'supports' devices handles for zephyr v2.7.0; it isn't sufficiently thought through or stabilized for an LTS release

It is exactly as thought out and stabilized as the dependent devices handles, as they are both generated in the same manner and exposed via the same API. If the fudge factor Kconfig is acceptable as a solution, the 3 lines in gen_handles.py that forward dependencies can be added to forward supports, and the issue is resolved for now.

@mbolivar-nordic
Copy link
Contributor

I'm really confused. I keep trying to say that the supports direction is implemented incorrectly and you don't disagree explicitly, but you also don't want it reverted, so where is the error in my analysis?

@JordanYates
Copy link
Collaborator Author

I'm really confused. I keep trying to say that the supports direction is implemented incorrectly and you don't disagree explicitly, but you also don't want it reverted, so where is the error in my analysis?

If I find a bug in some random subsystem, the first port of call is not to revert the entire feature. I'm not sure why this is any different. Yes, the current implementation does not mirror the depend direction, but with 3 lines of code added in gen_handles.py it does.

@carlescufi carlescufi added the dev-review To be discussed in dev-review meeting label Sep 29, 2021
@carlescufi
Copy link
Member

I'm really confused. I keep trying to say that the supports direction is implemented incorrectly and you don't disagree explicitly, but you also don't want it reverted, so where is the error in my analysis?

If I find a bug in some random subsystem, the first port of call is not to revert the entire feature. I'm not sure why this is any different. Yes, the current implementation does not mirror the depend direction, but with 3 lines of code added in gen_handles.py it does.

Since there is a clear disagreement and we are running out of time to choose between this PR and #38834, I have added the dev-review label to both to see if we can make progress into finding common ground in the meeting.

@JordanYates
Copy link
Collaborator Author

I am currently on leave and travelling, and am unable to make dev-review.

Note that I am not arguing for this PR over #38834, instead that the issue could be resolved via the fudge factor Kconfig + the support forwarding, as opposed to the Kconfig + reverting the supports feature.

@mbolivar-nordic mbolivar-nordic removed the dev-review To be discussed in dev-review meeting label Sep 30, 2021
@mbolivar-nordic
Copy link
Contributor

Removed dev-review since we won't have @JordanYates . My primary feeling is that the supports arrays are large, making it more likely that the fudge factor will be needed. The fudge factor default is 0 and I'm putting it in as a last resort for users who might face this bug on existing code, but my hope is that users never have to see it or know about it in practice, which seems likely if we stick to the device requirements only, as they haven't generated any bug reports along these lines.

My secondary feeling is that this bug is evidence that we didn't get enough testing done on this, since we should have had coverage for direct and transitive support related dependencies. Therefore I am not optimistic about putting it into LTS without more testing. We had to revert the requires stuff right before 2.5 and get it in for 2.6 (I may have the exact numbers wrong but it's something like that), and I think a similar level of caution is warranted here especially for an LTS.

@JordanYates JordanYates closed this Oct 6, 2021
@JordanYates JordanYates deleted the 210812_ord_padding branch August 15, 2022 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Device Model area: Devicetree Tooling PR modifies or adds a Device Tree tooling area: Devicetree bug The issue is a bug, or the PR is fixing a bug DNM This PR should not be merged (Do Not Merge) priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tests/drivers/uart/uart_basic_api/drivers.uart.cdc_acm fails to build
7 participants