-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add <mstd_tuple> and ARMC5 <tuple> #11265
Conversation
@kjbracey-arm, thank you for your changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early review, it looks very good. I still need to review the infamous tuple_cat
.
struct is_tuple : std::false_type { }; | ||
|
||
template <typename... T> | ||
struct is_tuple<tuple<int, T...>> : std::true_type { }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to understand why we need the int
or index_sequence
as the first template parameter; a tuple can begin with a different template argument and even with no argument at all : tuple<>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was intended to be tuple_base<index_sequence<I...>, T...>
It was fun, but I don't think you want to ever use that implementation. It's there for completeness, but without Other compilers can do quite a good job with One remaining oddity and potentially slight inefficiency is that Seems weird, and contrary to all the ARM EABI docs I can find, which say that C/C++ structures are always passed embedded in the parameter list - only languages with non-constant size structures are supposed to pass them by-value via a pointer. (puzzled emoji here) |
Code generated with a regular struct is very similar (somehow tuple code is smaller 😕 ): https://godbolt.org/z/puSLsq Edit: similar when using a temporary: https://godbolt.org/z/Gg_GRt |
Largely because GCC is bad at handling structs. With the normal EABI, the actual tuple or struct on the stack can in principle be eliminated altogether ARMC5 would happily do this - it has a structure-splitting pass that can break a structure So with the normal ABI, I would expect that to compile to:
That would be impossible with the modified The ARM compiler added this optimisation at the same time as it added 64-bit integers. They are treated as two-word structures initially, but that loses loads of optimisations. The structure-splitting pass re-optimises them, as long as you don't take their address, and is generalised to do this for all small uniform "tuples", not just GCC lacks this optimisation, afaict, and produces bad small struct and 64-bit int code by comparison. Not sure how clang/ARMC6 copes. |
Thanks for the explanation, after some tests it looks like this optimisation is present with GCC on ARM64 and x86 😞 . I've spotted more things that needs to be addressed. I will list them latter today. Edit: There's also an overhead in defining the move constructor by hand (tried on GCC ARM 64). I'm not sure why and if it impacts ARMCC or not. |
Whoops - edited your comment rather than replying to it. Silly UI. That's quite confusing now :) |
Well, I have to define it by hand for ARMC5 - it doesn't support But I should be able to avoid defining at at all for trivial structures, so it falls back to the built-in copy. |
Yes, that's what I tried: https://godbolt.org/z/koUtFJ (you can also try for yourself the impact of default move constructor). I guess that optimisation is in the backend and not present for 32 bits arm.
Wouldn't it be possible to not declare it explicitly and use the old implicit instantiation mechanism ? Of course overloads that swallow a copy/move constructors have to be disabled for the type. |
Well, that's horrible. I imagine the optimisation may be there, but is being inhibited by sub-optimal tracking of "address-taken" attributes. It needs to be confident that no-one really needs the address. Maybe something in the ARM32 ABI leaves "address-taken" set due to an internal "memcpy" of the structure into position for the function call.
If I don't declare it, then I believe you just get the copy constructor. The move constructor or assignment are never implicitly generated, right? So I think I can not declare it when I detect that I am both |
Rules for implicitly declared move constructors are:
So yes it can be implicitly generated. |
Hmm, I'm violating the rule of 3/5/0, aren't I? None of the tuple machinery should need to actually declare any of those, right? I think I've only ended up like this because the standard itself violates the rule, by declaring 4 of them - copy/move assignment/construction as |
I suppose we can remove copy/move constructors, assignment operators and follow the rule of 0 route as ARMCC doesn't allow us to use the rule of 5. |
Adjusted following comments - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside minor import issues; it all looks good to me. Some tests that exercise the implementation would be useful.
* | ||
* http://blogs.microsoft.co.il/sasha/2015/01/12/implementing-tuple-part-1/ | ||
* | ||
* tuple_cat based on Peter Dimov's article "Simple C++11 metaprogramming", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting the references; it could help future maintainers. Dimov's article really is brilliant and easy to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't going to claim I'd devised all that myself :)
future maintainers
This is only a bridge for as long as ARMC5 limps on, so I'm not expecting a particularly extended lifetime in Mbed OS anyway.
tuples will be useful for things like `mbed::Event` and `mbed::Callback` - storing parameter packs from variadic templates. Create a C++14(ish) `<tuple>` for ARMC5, and a `<mstd_tuple>` that adds `apply` and `make_from_tuple` from C++17.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CI started |
Test run: FAILEDSummary: 1 of 4 test jobs failed Failed test jobs:
|
IAR returned error -11 but no error msg? I restarted to confirm, please check. |
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
For the record, @pan- and I did a bit more digging on the efficiency issues, and it's now understood. The generic (ie Itanium) C++ ABI that ARM uses says:
So, if a The Therefore GCC However, GCC has another issue, which means that the structure splitting optimisation apparently never activates for ARM32.; itdoes for ARM64 or x86-64. This means that Phew. |
Description
tuple
s will be useful for things likembed::Event
andmbed::Callback
- storing parameter packs from variadic templates.Create a C++14(ish)
<tuple>
for ARMC5, and a<mstd_tuple>
that addsapply
andmake_from_tuple
from C++17.Pull request type
Release Notes
This is an extension to #11039, and doesn't need any further notes beyond that.