-
Notifications
You must be signed in to change notification settings - Fork 61
Ranges: Generate is weird
Denis Yaroshevskiy edited this page May 18, 2021
·
1 revision
The consequence of us aligning things makes some algorithms weird.
In regular C++ I can write std::iota
in the following way:
template< std::output_iterator I, class T >
void iota( I f, I l, T v )
{
std::generate(f, l, [&v]() mutable { return v++; });
}
Which could naively translate in SIMD like:
template< std::output_iterator I, std::integral T >
void iota( I f, I l, T v )
{
eve::wide<T> wv{v};
eve::wide<T> step([](int i, int) { return i; });
generate(f, l, [&]() mutable {
auto res = wv + step;
wv += step;
return wv;
});
}
NOTE: the extra integral restriction is to make the step easier to do, nothing more.
However in SIMD this is either not correct or not efficient! (and it better be efficient :P).
The problem is, if my iterator is not aligned, the algorithm should try to align it. But this will mess with the offset and I might get smth like:
[3, 4, 5, 6, ...]
even if the initial value is 0
.
Obviously we can fix it for iota
but for general purpose generate
this might still bite users.
Especially if they just test on a std::vector
which'd just happen to allocate aligned enough.
Possible solutions:
- Documentation. We just accept that this is the behaviour.
- Force the precise iteration for generate. Just mentioning for completeness, I really don't want to sacrifice perf. + People often keep state in other algorithms too. Why not transform + some dynamic offset or smth.
-
do_not_partially_align
trait (bikeshed pending). We should have it regardless of this but we can guarantee that the code example will work. Obviously not the most performant option. - Users can write their own algorithm for this case. Just quite a bit of work.
- I was also thinking for some
trait
to get more information in the callback but I couldn't figure out how to even writeiota
with it.