-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
device: insert padding to ordinal array #38487
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_estimate_fanout()
looks awfully familiar ;-)
nice work
works on my desk:
|
Given the reservations he expressed about this approach, please don't merge this until @mbolivar-nordic has had a chance to review. |
@JordanYates You may add "DNM" label to stop this from being merged. |
@@ -132,6 +132,40 @@ def scc_order(self): | |||
return self.__scc_order | |||
__scc_order = None | |||
|
|||
@staticmethod | |||
def _estimate_fanout(edge_map, root): | |||
# Compute an estimate of the maximum fan-out of the edge_map from a given root. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a writeup of the algorithm a bit more abstractly along with more of a formal "algorithms 101" argument why it is computing an upper bound on the correct fanout? I'd like to see that recorded in the commit log for when we get back to this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I've had a go at improving the explanation.
I never took "Algorithms {N}", so let me know if it's missing some essential component of an algorithm description :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The observation that the number we want to compute is the number of leaf nodes if the graph is a tree makes sense. but I'm not totally sure about what you mean for observations 2. and 3. I'm still thinking this over as a result. (If you have any friends who are good at graph problems who might be able to provide a proof that it's intractable or a clever algorithm for finding the answer, please ask them also...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually its a transform from a DAG to a N-ary tree. 2 and 3 are trying to explain that the transform operation cannot reduce the actual maximum fanout. Once you get to an N-ary tree, the maximum fanout is easy to compute, but due to the transforms, it can be somewhat higher than the fanout for the original graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still thinking about this; thanks for your patience. I wish Peter were still here; I've had to spin up on the internals of all this again while distracted with other things. I am making this PR a priority for this week though.
Estimate maximum fan-out of the tree by treating it as an N-ary tree with special terminating conditions. This approach will overestimate the value for heavily interdependent trees, but I suspect the actual maximum fan-out is an NP hard problem. The algorithm depends on three observations: 1. The maximum fan-out of an N-ary tree is the number of leaf node. 2. Merging two nodes in an N-ary tree (preserving their edges) can only decrease the maximum fan-out of the tree. 3. Therefore "cutting" edges by swapping an edge for a new leaf node can only increase the fan-out (assuming we don't isolate portions of the tree). Therefore by cutting edges until we have an N-ary tree and then counting the leaf-nodes, we arrive at an overestimation of the true fan-out. Signed-off-by: Jordan Yates <[email protected]>
Adds properties to query the estimated fan-out of a node in both the supported and required directions. Signed-off-by: Jordan Yates <[email protected]>
Add tests for the new fan-out properties. Signed-off-by: Jordan Yates <[email protected]>
Generate a property which describes the padding that must be applied to the `REQUIRES_ORDS` array in order to ensure that the postprocessing performed in `gen_handles.py` does not change the size of the array. This list is empty for most nodes on most SOC's. The list is enumerated in the property as using `UTIL_LISTIFY` inside `DEVICE_DEFINE` results in compilation failures. Signed-off-by: Jordan Yates <[email protected]>
Add a macro to query the padding list that need to be inserted into the ordinal array to ensure that sizes do not increase in the second linking stage. Signed-off-by: Jordan Yates <[email protected]>
Insert padding into the handles array to account for potential fan-out from direct dependencies that don't exist in the final binary. ``` gpio clock \ / usb | cdc_acm ``` For example in the above devicetree snippet, creating the handles array with only 1 dependent device will be insufficient if the usb node is not present, as 2 ordinals will need to inserted. This is a temporary solution until #32129 is resolved. Fixes #38181. Signed-off-by: Jordan Yates <[email protected]>
Here's a wacky idea, assuming we can add arbitrary properties to a DT node. Re-using the diagram from here.
Why not just add an artifical property in I think it's just failing 1 test right now, right? So the scope of the workaround is pretty limited. Maybe we could provide the workaround and then create an Turn the bug into a feature 😏 |
I don't understand what this is trying to achieve, and why it only has 3 handles in it?
Only because that test is "unlucky", there is no telling what sorts of devicetree structures users have generated. |
What determines the size of the dependency array? Direct dependencies, no? |
No, that is the size that is currently allocated for it, but it is not always sufficient. That is what this PR is addressing. The final size of the dependency array depends on which of the upstream devices are enabled. The dependency tree is iterated over for all paths, whenever a device that exists is found, it is inserted into the array and iteration on that branch stops. |
That is exactly what my proposal addresses. CREATE direct dependencies for the additional dependencies that would otherwise be hidden. It's a workaround, and requires that we know the hidden dependencies in advance for any affected device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for all your work and analysis here so far; it's been extremely helpful.
My main issues with this approach are:
- edtlib should not care about this. This is fundamentally a workaround for a limitation in Zephyr's build system combined with constraints in its userspace implementation. Putting the logic in edtlib is the wrong layering. Especially as we are now trying to use edtlib in https://github.com/devicetree-org/lopper and have existing work spinning out the edtlib codebase into its own project (https://github.com/zephyrproject-rtos/python-devicetree), it is only becoming more important to keep the layering clean.
- Similarly, we should not be extending devicetree.h with this. That's a stable API, so it does not make sense to be adding macros like DT_REQUIRES_DEP_ORDS_PADDING. That macro gives back values which are best-effort attempts to upper bound the solutions of a difficult graph problem that are not really related to the devicetree itself (see 1.), so it doesn't belong in devicetree.h. If we resolve the build system limitations from 1., we would have to go through the stable API change process to remove DT_REQUIRES_DEP_ORDS_PADDING, which clearly would not make sense.
- There is no formal argument that the computed padding is in fact a valid upper bound on the worst-case value, and the commit logs are pretty scarce on the details we need to check the analysis or improve it later.
- Related to 3., as far as I can tell, the problem also exists in the "support" direction as well; I think we just haven't run into it due to a bug in 0c6588f. Notice how the "forward the dependency up one level" step that's done for
deps
is not being done forsups
here: 0c6588f#diff-521e5dcb0b3ae49bca68c34cdb0b300f3341ca9d52d185beab8a4cf8b2bbec96R287. As far as I can tell, that's wrong, because we still need to keep going deeper in the devicetree to find the nodes which do correspond to devices that our node "supports".
To deal with 1. and 2. I think we need to generate a map from node identifiers to padding values in gen_defines.py only, making no changes to python-devicetree. The results should go into a new generated header, named device_handle_padding_workaround.h
or so in order to make it clear that this isn't something we should want to live with forever.
To deal with 3. I think we need a better definition of this problem and analysis of it. Like I said a few days ago, I think the tree observation is a good one, but it's not enough, and definitely the commit logs / source code comments don't capture the problem.
To deal with 4., the fix seems clear, but I'm curious if you (@JordanYates) can find an error in my above analysis.
I've been trying to stay out of gen_handles.py as much as possible, but as a result of this issue I have to say I think it's time to refactor it. The way it's written makes it difficult to read and reason about, which is really not good for such a critical script, especially since its original author is no longer working on Zephyr and it is therefore unmaintained. It took a long time for me to spin up on these details, and the script's readability issues did not help.
I'm working on an alternate proposal that should address these issues. I know time is tight for 2.7.0; this continues to be my top priority for the week.
The problem is the build system limitation, but the question of "what is the worst case fanout" is a generic graph question, even if it doesn't have much application beyond solving this issue. I do understand you don't want non-formal algorithms in its API however.
The stable API considerations makes sense.
That's fair, I haven't been able to find an equivalent problem in literature, and I don't have the maths background to formalize a proof.
I explicitly excluded it from the original PR as I could tell that the size of the output could easily exceed the size of the direct supported devices. Unfortunately I didn't turn that analysis back on the dependency implementation at the time. At one stage for this PR I did add that support back in, but my fan-out algorithm can massively overestimate fan-out for high level nodes like clock controllers (by 100's of handles on complex trees like RE the proposed solutions: Moving the logic from While there is an obvious solution to 4 (forwarding the supports), without a better fan-out estimation (or multiple linker stages), IMO the overhead is too large. I would strongly argue against reverting it entirely however as iterating direct supports are indispensable for some problems (e.g. devices on a power switch). I thing |
I discussed with @tejlmand this morning; he and I agree on the following:
Related to the final point above,
@JordanYates your response does not include a straightforward "yes" or "no", but you seem to agree that there is no error. So my overall suggestions are:
I'm preparing a separate issue for the 'real fix' and PR for the workaround for v2.7.0. |
I can agree with the first set of points. I disagree with:
It is exactly as thought out and stabilized as the dependent devices handles, as they are both generated in the same manner and exposed via the same API. If the fudge factor Kconfig is acceptable as a solution, the 3 lines in |
I'm really confused. I keep trying to say that the supports direction is implemented incorrectly and you don't disagree explicitly, but you also don't want it reverted, so where is the error in my analysis? |
If I find a bug in some random subsystem, the first port of call is not to revert the entire feature. I'm not sure why this is any different. Yes, the current implementation does not mirror the depend direction, but with 3 lines of code added in |
Since there is a clear disagreement and we are running out of time to choose between this PR and #38834, I have added the dev-review label to both to see if we can make progress into finding common ground in the meeting. |
I am currently on leave and travelling, and am unable to make dev-review. Note that I am not arguing for this PR over #38834, instead that the issue could be resolved via the fudge factor Kconfig + the support forwarding, as opposed to the Kconfig + reverting the supports feature. |
Removed dev-review since we won't have @JordanYates . My primary feeling is that the supports arrays are large, making it more likely that the fudge factor will be needed. The fudge factor default is 0 and I'm putting it in as a last resort for users who might face this bug on existing code, but my hope is that users never have to see it or know about it in practice, which seems likely if we stick to the device requirements only, as they haven't generated any bug reports along these lines. My secondary feeling is that this bug is evidence that we didn't get enough testing done on this, since we should have had coverage for direct and transitive support related dependencies. Therefore I am not optimistic about putting it into LTS without more testing. We had to revert the requires stuff right before 2.5 and get it in for 2.6 (I may have the exact numbers wrong but it's something like that), and I think a similar level of caution is warranted here especially for an LTS. |
Insert padding into the handles array in devices to account for the fact that dependencies can "fan-out" to more devices than are included in the direct
_REQUIRES_ORDS
property, if nodes in that array do not exist in the final binary.While working on this I attempted to forward down supported devices as well, but this results in paddings of many hundreds of handles for high level nodes like interrupt and clock controllers on heavily interconnected platforms like
frdm_k64f
. The ROM increase for that can't be justified, and I don't think an algorithm exists to calculate the actual fan-out value, so supporting this information must wait for multiple link stages.Resolves #38181