Skip to content

Commit

Permalink
Add simplified C++23 chunk_view (#5035)
Browse files Browse the repository at this point in the history
This PR adds a class to provide a subset of the functionality of the
C++23 `std::ranges::chunk_view`. This view divides an underlying view
into a range of equal-sized chunks (the last one may be shorter than the
others). A complete description of `chunk_view` can be found on
cppreference.com, which also includes "exposition only" aspects that
point to particular implementation approaches. We did not adopt those
approaches in most cases, though the public API of the class matches the
spec, for the functions that we did implement. This view will be used to
partition the original input data from the user query into, well, chunks
that will be individually sorted and then written to a TileDB array as a
data block. Data blocks are of fixed and uniform byte size that will
contain a chunk smaller than or equal to its byte limit.

* The class assumes a` random_access_range`, which is what we have in
our buffers that we are chunking, so I didn't try to generalize any
further than that.

* The class provides a complete iterator interface via `iterator_facade`
as well as a complete view interface via `view_interface`. Aspects of
those interfaces that rely on C++23 are pended.

* Unit tests validate that `chunk_view` and its iterator meet the
expected `std::ranges` concepts.

* The unit tests do not yet test composition of this view with other
views (our custom views as well as other C++20 views). This will be
fairly extensive and the subject of its own PR.

* One todo would be to create a variable chunk class view that chunks up
the underlying view into variable-sized pieces, the better to fit into
the data blocks. However, the data blocks will be fairly large, so there
should not be too much internal fragmentation. We can always manage
internal fragmentation by making the chunks some fraction of the block
size, so even if there is some variation in chunk size, they can be
well-packed into the data blocks.

---
TYPE: IMPROVEMENT
DESC: Implementation of a `chunk_view` class to provide a subset of
C++23 chunk_view, suitable for supporting external sort.

---------

Co-authored-by: Luc Rancourt <lucrancourt@gmail.com>
  • Loading branch information
lums658 and KiterLuc authored Jun 6, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
1 parent 2593e54 commit 0fafe92
Showing 33 changed files with 1,064 additions and 72 deletions.
2 changes: 1 addition & 1 deletion experimental/tiledb/common/dag/utility/range_join.h
Original file line number Diff line number Diff line change
@@ -73,7 +73,7 @@
#include <type_traits>
#include <utility>
#include <vector>
#include "tiledb/common/util/arrow_proxy.h"
#include "tiledb/common/util/detail/arrow_proxy.h"

#include "external/include/span/span.hpp"

Original file line number Diff line number Diff line change
@@ -31,7 +31,7 @@
#include "../range_join.h"
#include "../spinlock.h"
#include "../traits.h"
#include "tiledb/common/util/arrow_proxy.h"
#include "tiledb/common/util/detail/arrow_proxy.h"

int main() {
}
1 change: 1 addition & 0 deletions tiledb/common/util/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -27,4 +27,5 @@
include(common NO_POLICY_SCOPE)
include(object_library)

add_subdirectory(detail)
add_test_subdirectory()
4 changes: 2 additions & 2 deletions tiledb/common/util/alt_var_length_view.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* @file alt_alt_var_length_view.h
* @file tiledb/common/util/alt_var_length_view.h
*
* @section LICENSE
*
@@ -56,7 +56,7 @@

#include <ranges>

#include "iterator_facade.h"
#include "tiledb/common/util/detail/iterator_facade.h"

/**
* A view that splits a view into subranges of variable length, as delimited by
43 changes: 43 additions & 0 deletions tiledb/common/util/block_view.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
/**
* @file tiledb/common/util/block_view.h
*
* @section LICENSE
*
* The MIT License
*
* @copyright Copyright (c) 2024 TileDB, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* @section DESCRIPTION
*
* This file implements a zip view for zipping together a set of ranges.
* It is intended to implement the zip view as defined for C++23. From
* https://en.cppreference.com/w/cpp/ranges/zip_view:
* 1) A zip_view is a range adaptor that takes one or more views, and produces
* a view whose ith element is a tuple-like value consisting of the ith elements
* of all views. The size of produced view is the minimum of sizes of all
* adapted views. 2) zip is a customization point object that constructs a
* zip_view.
*
*/

#ifndef TILEDB_BLOCK_VIEW_H
#define TILEDB_BLOCK_VIEW_H
#endif // TILEDB_BLOCK_VIEW_H
30 changes: 30 additions & 0 deletions tiledb/common/util/detail/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#
# tiledb/util/detail/CMakeLists.txt
#
# The MIT License
#
# Copyright (c) 2024 TileDB, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#

include(common NO_POLICY_SCOPE)
include(object_library)

add_test_subdirectory()
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* @file arrow_proxy.hpp
* @file tiledb/common/util/arrow_proxy.hpp
*
* @section LICENSE
*
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* @file iterator_facade.h
* @file tiledb/common/util/iterator_facade.h
*
* @section LICENSE
*
@@ -42,7 +42,7 @@
#include <iterator>
#include <type_traits>

#include "arrow_proxy.h"
#include "tiledb/common/util/detail/arrow_proxy.h"

namespace detail {

33 changes: 33 additions & 0 deletions tiledb/common/util/detail/test/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#
# tiledb/util/detail/test/CMakeLists.txt
#
# The MIT License
#
# Copyright (c) 2024 TileDB, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#

include(unit_test)

commence(unit_test iterator_facade)
this_target_sources(
unit_iterator_facade.cc
)
conclude(unit_test)
Original file line number Diff line number Diff line change
@@ -40,7 +40,7 @@
#include <catch2/catch_all.hpp>
#include <numeric>
#include <ranges>
#include "../iterator_facade.h"
#include "tiledb/common/util/detail/iterator_facade.h"

TEST_CASE("iterator_facade: Null test", "[iterator_facade][null_test]") {
REQUIRE(true);
4 changes: 2 additions & 2 deletions tiledb/common/util/permutation_view.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* @file permutation_view.h
* @file tiledb/common/util/permutation_view.h
*
* @section LICENSE
*
@@ -50,8 +50,8 @@

#include <cassert>
#include <ranges>
#include "iterator_facade.h"
#include "proxy_sort.h"
#include "tiledb/common/util/detail/iterator_facade.h"

/**
* A view that creates a permutation of the underlying view, as determined by
47 changes: 44 additions & 3 deletions tiledb/common/util/print_types.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* @file print_types.h
* @file tiledb/common/util/print_types.h
*
* @section LICENSE
*
@@ -28,9 +28,50 @@
* @section DESCRIPTION
*
* This file implements a compile-time debugging utility for investigating
* the specific types of objects.
* the specific types of objects. It is the equivalent of a compile-time
* `printf`, but for types and is useful for debugging compile-time issues
* related to templates. It is intended for development use only and should
* not be included in production code.
*
* Based on utility from NWGraph. Author Luke D'Alessandro.
* As a compile-time "printf" for types, it will generate a compiler error
* whose message will contain the expanded types of the variables passed to it.
* Note that the `print_types` takes *variable* as argument, not actual types
* or template parameters. The programmer will need to inspect the error
* message to determine the types of the variables.
*
* @example
*
* // See https://godbolt.org/z/3q341cs57 for running example
* #include <vector>
* #include "tiledb/common/util/print_types.h"
*
* template <class S, class T>
* void foo(const S& s, const T& t) {
* // Note we pass s and t, not S and T
* print_types(s, t);
* }
*
* int main() {
* foo(1.0, std::vector<std::vector<int>>{{1, 2}, {3, 4}});
* }
*
* This will produce an error message like the following (gcc 12.1):
* <source>:18:31: error: invalid use of incomplete type 'struct
* print_types_t<double, std::vector<std::vector<int, std::allocator<int> >,
* std::allocator<std::vector<int, std::allocator<int> > > > >' 18 | return
* print_types_t<Ts...>{};
*
* Or like the following (clang 16.0):
* <source>:18:10: error: implicit instantiation of undefined template
* 'print_types_t<double, std::vector<std::vector<int>>>' return
* print_types_t<Ts...>{};
*
* The types of the variable `s` and `t` are contained in the template
* arguments shown for `print_types`. In this case they are respectively
* `double` and `std::vector<std::vector<int>>`.
*
* This file is based on the `print_types` utility from NWGraph.
* Author Luke D'Alessandro.
*/

#ifndef TILEDB_PRINT_TYPES_H
2 changes: 1 addition & 1 deletion tiledb/common/util/proxy_sort.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* @file proxy_sort.h
* @file tiledb/common/util/proxy_sort.h
*
* @section LICENSE
*
53 changes: 24 additions & 29 deletions tiledb/common/util/test/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -26,56 +26,51 @@

include(unit_test)

commence(unit_test unit_alt_var_length_view)
this_target_sources(
unit_alt_var_length_view.cc
commence(unit_test alt_var_length_view)
this_target_sources(
unit_alt_var_length_view.cc
)
conclude(unit_test)

commence(unit_test unit_iterator_facade)
this_target_sources(
unit_iterator_facade.cc
commence(unit_test permutation_sort)
this_target_sources(
unit_permutation_sort.cc
)
conclude(unit_test)

commence(unit_test unit_permutation_sort)
this_target_sources(
unit_permutation_sort.cc
commence(unit_test permutation_view)
this_target_sources(
unit_permutation_view.cc
)
conclude(unit_test)

commence(unit_test unit_permutation_view)
this_target_sources(
unit_permutation_view.cc
commence(unit_test proxy_sort)
this_target_sources(
unit_proxy_sort.cc
)
conclude(unit_test)

commence(unit_test unit_proxy_sort)
this_target_sources(
unit_proxy_sort.cc
commence(unit_test sort_zip)
this_target_sources(
unit_sort_zip.cc
)
conclude(unit_test)

commence(unit_test unit_sort_zip)
this_target_sources(
unit_sort_zip.cc
commence(unit_test var_length_util)
this_target_sources(
unit_var_length_util.cc
)
conclude(unit_test)

commence(unit_test unit_var_length_util)
this_target_sources(
unit_var_length_util.cc
)
conclude(unit_test)

commence(unit_test unit_var_length_view)
this_target_sources(
commence(unit_test var_length_view)
this_target_sources(
unit_var_length_view.cc
)
conclude(unit_test)

commence(unit_test unit_zip_view)
this_target_sources(
unit_zip_view.cc
commence(unit_test block_view)
this_target_sources(
unit_block_view.cc
)
conclude(unit_test)

2 changes: 1 addition & 1 deletion tiledb/common/util/test/unit_alt_var_length_view.cc
Original file line number Diff line number Diff line change
@@ -33,7 +33,7 @@
#include <algorithm>
#include <catch2/catch_all.hpp>
#include <vector>
#include "../alt_var_length_view.h"
#include "tiledb/common/util/alt_var_length_view.h"

TEST_CASE(
"alt_var_length_view: Null test", "[alt_var_length_view][null_test]") {
38 changes: 38 additions & 0 deletions tiledb/common/util/test/unit_block_view.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/**
* @file unit_block_view.cc
*
* @section LICENSE
*
* The MIT License
*
* @copyright Copyright (c) 2024 TileDB, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* @section DESCRIPTION
*
* This file implements unit tests for the block_view class.
*/

#include <catch2/catch_all.hpp>
#include "tiledb/common/util/block_view.h"

TEST_CASE("block_view: null test", "[block_view][null test]") {
REQUIRE(true);
}
Loading

0 comments on commit 0fafe92

Please sign in to comment.