Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement zip_view for external sort. #4930

Merged
merged 2 commits into from
May 2, 2024
Merged

Implement zip_view for external sort. #4930

merged 2 commits into from
May 2, 2024

Conversation

lums658
Copy link
Contributor

@lums658 lums658 commented May 1, 2024

This PR implements a zip_view for zipping together a set of ranges. It is intended to implement the std::ranges::zip_view as defined for C++23. From
https://en.cppreference.com/w/cpp/ranges/zip_view:

  1. A zip_view is a range adaptor that takes one or more views, and produces a view whose ith element is a tuple-like value consisting of the ith elements of all views. The size of produced view is the minimum of sizes of all adapted views.
  2. zip() is a customization point object that constructs a zip_view

Currently, the zip_view only supports zipping together ranges that are random_access_ranges. In addition, the size() and end() functions are only provided if all of the ranges are sized_ranges

The iterator from a zip_view is essentially a tuple of pointers to the beginning of each of the zipped ranges, plus an index that keeps track of the iterator's location in the zipped ranges. The size of a zip_view is the size of the smallest range. The end iterator of a zip view is the begin iterator plus the size of the zip_view.

Unit tests have similar coverage to the tests for the var length views. Tests also include zipping a var length view and iterating through with std::for_each and for.

[sc-43639]


TYPE: IMPROVEMENT
DESC: Implement zip_view for external sort.

This PR is stacked on #4925.  It adds additional constructors for the two view classes for var length data.

The previous implementations had constructors that just took input ranges, and constructed a var length view over the data specified by those ranges.  However, in general, we may have a range that is not completely filled with data, and so may need to specify the data to be viewed in some other way.  The complete set of constructors take
* ranges for data and offsets (arrow format) - the size of the resulting view is one less than the size of the offset range
* ranges plus sizes for data and offsets (arrow format) - the size of the resulting view is one less than the size given for the offset range
* iterator pairs for data and offsets (arrow format) - the size of the resulting view is one less than the difference between end and begin of the offsets pair
* iterator pairs for data and offsets, with sizes (arrow format) - the size of the resulting view is one less than the size given for the offset range
* ranges for data and offsets (tilledb format) - the size of the resulting view is equal to the size of the offset range (takes an extra argument for last offset value)
* ranges plus sizes for data and offsets (tiledb format) - the size of the resulting view is equal to the size given for the offset range (takes an extra argument for last offset value)
* iterator pairs for data and offsets (tiledb format) - the size of the resulting view is equal to the difference between end and begin of the offsets pair (takes an extra argument for last offset value)
* iterator pairs for data and offsets, with sizes (tiledb format) - the size of the resulting view is equal than the size given for the offset range (takes an extra argument for last offset value)

---
TYPE: NO_HISTORY
DESC: Additional constructors for external sort classes.
Base automatically changed from al/additional-constructors/ch44576 to dev May 1, 2024 18:00
@KiterLuc KiterLuc closed this May 1, 2024
@KiterLuc KiterLuc reopened this May 1, 2024
This PR implements a `zip_view` for zipping together a set of ranges.
It is intended to implement the `std::ranges::zip_view` as defined for C++23.  From
 https://en.cppreference.com/w/cpp/ranges/zip_view:
 1.  A zip_view is a range adaptor that takes one or more views, and produces
 a view whose ith element is a tuple-like value consisting of the ith elements
 of all views. The size of produced view is the minimum of sizes of all
 adapted views.
  2. `zip()` is a customization point object that constructs a
 `zip_view`

Currently, the `zip_view` only supports zipping together ranges that are `random_access_range`s.  In addition, the `size()` and `end()` functions are only provided if all of the ranges are `sized_range`s

The iterator from a `zip_view` is essentially a tuple of pointers to the beginning of each of the zipped ranges, plus an index that keeps track of the iterator's location in the zipped ranges.  The size of a `zip_view` is the size of the smallest range.  The end iterator of a zip view is the begin iterator plus the size of the zip_view.

Unit tests have similar coverage to the tests for the var length views.  Tests also include zipping a var length view and iterating through with `std::for_each` and `for`.

---
TYPE: IMPROVEMENT
DESC: Implement zip_view for external sort.
@KiterLuc KiterLuc force-pushed the al/zip-view/sc43639 branch from 107d496 to 9b03531 Compare May 2, 2024 07:05
KiterLuc pushed a commit that referenced this pull request May 2, 2024
This PR stacks on #4930.  The actual code for implementing the PR is fairly small, but with extensive unit tests.

The PR contains `swap` function for `zip_view` in order to be able to use `std::sort` on a `zip_view`.  The difficulty is that dereferencing a `zip_view` iterator returns a proxy, specifically a tuple of references.  In order to enable `swap` to be found for this via ADL, we write
```c++
template <class... Ts>
  requires(std::is_swappable<Ts>::value && ...)
void swap(std::tuple<Ts...>&& x, std::tuple<Ts...>&& y) {
  using std::get;
  using std::swap;
  [&]<std::size_t... i>(std::index_sequence<i...>) {
    (swap(get<i>(x), get<i>(y)), ...);
  }(std::make_index_sequence<sizeof...(Ts)>());
}
```
and then put it into the `std` namespace:
```c++
namespace std {
using ::swap;
}
```

The provided unit tests exercise `swap`, `iter_swap`, and `sort`, including on `zip_view`s containing `alt_var_length_view`s.

(NOTE:  I didn't specifically write an `item_swap` as `std::iter_swap` falls back to `swap`.)

The PR also changes `alt_var_length_view` and `zip_view` to derive from `std::ranges::view_interface` rather than `std::ranges::view_base` in order to automagically get `operator[]` for the view.

NOTES:
1. Not all implementations of `std::sort` use `std::swap` for all problem sizes, but will use insertion sort instead for small sizes.
2. For insertion sort to work properly with a tuple of references, i.e., a proxy, it is important to get all of the details of `value_type` and `reference` correct.  There was a bug that took a while to track down of `value_type` not being defined in the `zip_iterator`.  The `iterator_facade` will create the wrong default thing.

---
TYPE: IMPROVEMENT
DESC: Write iter_swap for zip_view for external sort.
KiterLuc pushed a commit that referenced this pull request May 2, 2024
This PR stacks on #4930.  The actual code for implementing the PR is fairly small, but with extensive unit tests.

The PR contains `swap` function for `zip_view` in order to be able to use `std::sort` on a `zip_view`.  The difficulty is that dereferencing a `zip_view` iterator returns a proxy, specifically a tuple of references.  In order to enable `swap` to be found for this via ADL, we write
```c++
template <class... Ts>
  requires(std::is_swappable<Ts>::value && ...)
void swap(std::tuple<Ts...>&& x, std::tuple<Ts...>&& y) {
  using std::get;
  using std::swap;
  [&]<std::size_t... i>(std::index_sequence<i...>) {
    (swap(get<i>(x), get<i>(y)), ...);
  }(std::make_index_sequence<sizeof...(Ts)>());
}
```
and then put it into the `std` namespace:
```c++
namespace std {
using ::swap;
}
```

The provided unit tests exercise `swap`, `iter_swap`, and `sort`, including on `zip_view`s containing `alt_var_length_view`s.

(NOTE:  I didn't specifically write an `item_swap` as `std::iter_swap` falls back to `swap`.)

The PR also changes `alt_var_length_view` and `zip_view` to derive from `std::ranges::view_interface` rather than `std::ranges::view_base` in order to automagically get `operator[]` for the view.

NOTES:
1. Not all implementations of `std::sort` use `std::swap` for all problem sizes, but will use insertion sort instead for small sizes.
2. For insertion sort to work properly with a tuple of references, i.e., a proxy, it is important to get all of the details of `value_type` and `reference` correct.  There was a bug that took a while to track down of `value_type` not being defined in the `zip_iterator`.  The `iterator_facade` will create the wrong default thing.

---
TYPE: IMPROVEMENT
DESC: Write iter_swap for zip_view for external sort.
@KiterLuc KiterLuc merged commit d861e95 into dev May 2, 2024
58 checks passed
@KiterLuc KiterLuc deleted the al/zip-view/sc43639 branch May 2, 2024 07:56
KiterLuc pushed a commit that referenced this pull request May 2, 2024
This PR stacks on #4930.  The actual code for implementing the PR is fairly small, but with extensive unit tests.

The PR contains `swap` function for `zip_view` in order to be able to use `std::sort` on a `zip_view`.  The difficulty is that dereferencing a `zip_view` iterator returns a proxy, specifically a tuple of references.  In order to enable `swap` to be found for this via ADL, we write
```c++
template <class... Ts>
  requires(std::is_swappable<Ts>::value && ...)
void swap(std::tuple<Ts...>&& x, std::tuple<Ts...>&& y) {
  using std::get;
  using std::swap;
  [&]<std::size_t... i>(std::index_sequence<i...>) {
    (swap(get<i>(x), get<i>(y)), ...);
  }(std::make_index_sequence<sizeof...(Ts)>());
}
```
and then put it into the `std` namespace:
```c++
namespace std {
using ::swap;
}
```

The provided unit tests exercise `swap`, `iter_swap`, and `sort`, including on `zip_view`s containing `alt_var_length_view`s.

(NOTE:  I didn't specifically write an `item_swap` as `std::iter_swap` falls back to `swap`.)

The PR also changes `alt_var_length_view` and `zip_view` to derive from `std::ranges::view_interface` rather than `std::ranges::view_base` in order to automagically get `operator[]` for the view.

NOTES:
1. Not all implementations of `std::sort` use `std::swap` for all problem sizes, but will use insertion sort instead for small sizes.
2. For insertion sort to work properly with a tuple of references, i.e., a proxy, it is important to get all of the details of `value_type` and `reference` correct.  There was a bug that took a while to track down of `value_type` not being defined in the `zip_iterator`.  The `iterator_facade` will create the wrong default thing.

---
TYPE: IMPROVEMENT
DESC: Write iter_swap for zip_view for external sort.
KiterLuc pushed a commit that referenced this pull request May 2, 2024
This PR stacks on #4930.  The actual code for implementing the PR is fairly small, but with extensive unit tests.

The PR contains `swap` function for `zip_view` in order to be able to use `std::sort` on a `zip_view`.  The difficulty is that dereferencing a `zip_view` iterator returns a proxy, specifically a tuple of references.  In order to enable `swap` to be found for this via ADL, we write
```c++
template <class... Ts>
  requires(std::is_swappable<Ts>::value && ...)
void swap(std::tuple<Ts...>&& x, std::tuple<Ts...>&& y) {
  using std::get;
  using std::swap;
  [&]<std::size_t... i>(std::index_sequence<i...>) {
    (swap(get<i>(x), get<i>(y)), ...);
  }(std::make_index_sequence<sizeof...(Ts)>());
}
```
and then put it into the `std` namespace:
```c++
namespace std {
using ::swap;
}
```

The provided unit tests exercise `swap`, `iter_swap`, and `sort`, including on `zip_view`s containing `alt_var_length_view`s.

(NOTE:  I didn't specifically write an `item_swap` as `std::iter_swap` falls back to `swap`.)

The PR also changes `alt_var_length_view` and `zip_view` to derive from `std::ranges::view_interface` rather than `std::ranges::view_base` in order to automagically get `operator[]` for the view.

NOTES:
1. Not all implementations of `std::sort` use `std::swap` for all problem sizes, but will use insertion sort instead for small sizes.
2. For insertion sort to work properly with a tuple of references, i.e., a proxy, it is important to get all of the details of `value_type` and `reference` correct.  There was a bug that took a while to track down of `value_type` not being defined in the `zip_iterator`.  The `iterator_facade` will create the wrong default thing.

---
TYPE: IMPROVEMENT
DESC: Write iter_swap for zip_view for external sort.
KiterLuc pushed a commit that referenced this pull request May 2, 2024
This PR stacks on #4930. The actual code for implementing the PR is
fairly small, but with extensive unit tests.

The PR contains `swap` function for `zip_view` in order to be able to
use `std::sort` on a `zip_view`. The difficulty is that dereferencing a
`zip_view` iterator returns a proxy, specifically a tuple of references.
In order to enable `swap` to be found for this via ADL, we write
```c++
template <class... Ts>
  requires(std::is_swappable<Ts>::value && ...)
void swap(std::tuple<Ts...>&& x, std::tuple<Ts...>&& y) {
  using std::get;
  using std::swap;
  [&]<std::size_t... i>(std::index_sequence<i...>) {
    (swap(get<i>(x), get<i>(y)), ...);
  }(std::make_index_sequence<sizeof...(Ts)>());
}
```
and then put it into the `std` namespace:
```c++
namespace std {
using ::swap;
}
```

The provided unit tests exercise `swap`, `iter_swap`, and `sort`,
including on `zip_view`s containing `alt_var_length_view`s.

(NOTE: I didn't specifically write an `item_swap` as `std::iter_swap`
falls back to `swap`.)

The PR also changes `alt_var_length_view` and `zip_view` to derive from
`std::ranges::view_interface` rather than `std::ranges::view_base` in
order to automagically get `operator[]` for the view.

NOTES:
1. Not all implementations of `std::sort` use `std::swap` for all
problem sizes, but will use insertion sort instead for small sizes.
2. For insertion sort to work properly with a tuple of references, i.e.,
a proxy, it is important to get all of the details of `value_type` and
`reference` correct. There was a bug that took a while to track down of
`value_type` not being defined in the `zip_iterator`. The
`iterator_facade` will create the wrong default thing.

---
TYPE: IMPROVEMENT
DESC: Write iter_swap for zip_view for external sort.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants