Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

reduce_by_key (CUDA) should not assume its output iterator is default constructible #1812

Closed
harrism opened this issue Oct 11, 2022 · 5 comments
Labels

Comments

@harrism
Copy link
Contributor

harrism commented Oct 11, 2022

As discussed in #1804, thrust::reduce_by_key's CUDA implementation assumes that the output iterator it is passed can be default constructed.

pair<KeysOutputIt, ValuesOutputIt> result{};

This is made necessary by the implementation of the dispatch macro:

#define THRUST_INDEX_TYPE_DISPATCH(status, call, count, arguments) \
if (count <= thrust::detail::integer_traits<thrust::detail::int32_t>::const_max) { \
auto THRUST_PP_CAT2(count, _fixed) = static_cast<thrust::detail::int32_t>(count); \
status = call arguments; \
} \
else { \
auto THRUST_PP_CAT2(count, _fixed) = static_cast<thrust::detail::int64_t>(count); \
status = call arguments; \
}

The dispatch mechanism should be modified so that it's return value can be used to initialize a local so that a temporary does not need to be default constructed in order to use it.

@jrhemstad
Copy link
Collaborator

I wonder if one sneaky but convenient solution to this would be to use a thrust::optional for the result object that is passed to the dispatch macro, so:

    thrust::optional<pair<KeysOutputIt, ValuesOutputIt>> result{};
    THRUST_INDEX_TYPE_DISPATCH(result,
                               reduce_by_key_dispatch,
                               num_items,
                               (policy,
                                keys_first,
                                num_items_fixed,
                                values_first,
                                keys_output,
                                values_output,
                                equality_op,
                                reduction_op));

    return *result;

This would obviate the need to default construct an instance of the output iterators and doesn't require any changes to the dispatch macro as the status = call arguments will invoke the optional::operator= that fills the optional with a value.

Then we can extract the value before returning.

@harrism
Copy link
Contributor Author

harrism commented Oct 11, 2022

I was thinking of trying a new dispatch macro and use it at first only in this API. If it works, it can be gradually adopted elsewhere...

I still hate the "num_items_fixed" hack...

@harrism
Copy link
Contributor Author

harrism commented Dec 5, 2022

@senior-zero Perhaps you didn't see this issue when I filed it, but I think this should now be closed along with #1826?

Please also look at NVIDIA/cccl#821.

@gevtushenko
Copy link
Collaborator

Hello, @harrism! I've missed this issue indeed. I don't think it should be closed, though. I'd be glad to have an index dispatch version that returns a value. I think this would simplify the code and make future issues less probable.

@jrhemstad
Copy link
Collaborator

closed by #1827

We can open a new issue later as necessary when we update the dispatch mechanism across all algorithms.

@github-project-automation github-project-automation bot moved this to Done in CCCL Mar 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
Archived in project
Development

No branches or pull requests

3 participants