Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static_reduction_map #98

Closed
wants to merge 75 commits into from
Closed
Show file tree
Hide file tree
Changes from 71 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
033d2bc
Added initial static_reduction_map files.
jrhemstad Dec 16, 2020
fe606cd
Add template parameter for reduction binary op.
jrhemstad Jan 4, 2021
fd3b98f
Fix static_assert for ReductionOp::value_type.
jrhemstad Jan 4, 2021
a3678fb
CG reduction insert implementation.
jrhemstad Jan 5, 2021
5a65bf6
Cleanup of CG insert.
jrhemstad Jan 5, 2021
28e0995
Pass reduction op to device view ctors.
jrhemstad Jan 5, 2021
8dc64ee
Add pair ctor for constructing from two elements.
jrhemstad Jan 5, 2021
573bce2
Allow bulk insert kernel to work on iterators over tuples.
jrhemstad Jan 5, 2021
d9236e5
Add device decorator to reduction op definition.
jrhemstad Jan 5, 2021
89ed44e
Add get_op function to allow accessing the op from
jrhemstad Jan 5, 2021
e28db80
Make insert return a bool after all.
jrhemstad Jan 5, 2021
0eeac20
Use get_op in implementation.
jrhemstad Jan 5, 2021
fa31c81
Make insert return a bool.
jrhemstad Jan 5, 2021
ab81b2b
Correct insert to return if the key was the first key inserted.
jrhemstad Jan 5, 2021
46f9b73
First test verifying size passed.
jrhemstad Jan 5, 2021
8aebabb
Update CG insert logic.
jrhemstad Jan 6, 2021
9fb930e
Add more tests.
jrhemstad Jan 6, 2021
24261b2
Add test for inserting all unique keys.
jrhemstad Jan 7, 2021
e635e31
Use relaxed fetch_add.
jrhemstad Jan 7, 2021
d749445
Update the slot references each iteration.
jrhemstad Jan 7, 2021
ca9f7d6
Increase size of unique key test.
jrhemstad Jan 7, 2021
9eebd17
Make map size function of number of keys.
jrhemstad Jan 7, 2021
212b8f6
Add other agg ops.
jrhemstad Jan 7, 2021
cda527a
Add custom binary op.
jrhemstad Jan 7, 2021
7c1af0f
Return old value in custom op.
jrhemstad Jan 7, 2021
3f1b59d
reduction map benchmarks.
jrhemstad Apr 8, 2021
71a0122
Merge remote-tracking branch 'origin/dev' into reduction-map
jrhemstad May 13, 2021
2a38d70
Remove redundant ctor.
jrhemstad May 13, 2021
f2d1a26
Add initial static_reduction_map example.
jrhemstad May 13, 2021
3c79701
Remove cuda_memcmp header.
jrhemstad May 13, 2021
8261d93
Add unsafe accessors to raw slots via reinterpret_cast.
jrhemstad May 19, 2021
c6daa09
Add retreive_all implementation.
jrhemstad May 19, 2021
62a99ab
Add retrieve_all to example.
jrhemstad May 19, 2021
a526c16
Merge branch 'dev' into reduction-map
sleeepyjack Jul 23, 2021
c1fe449
Sync static_reduction_map with latest changes in static_map.
sleeepyjack Aug 1, 2021
fb9c0ec
Tests for static_reduction_map added.
sleeepyjack Aug 1, 2021
e8e5461
Benchmarks for static_reduction_map added + reduce-by-key performance…
sleeepyjack Aug 1, 2021
1d97a6f
Added CUDA stream support for static_reduction_map.
sleeepyjack Aug 1, 2021
b4351fc
Fix custom reduction op implementation and add exponential backoff st…
sleeepyjack Aug 2, 2021
80ef0ee
Parameter grid search for CAS loop backoff added.
sleeepyjack Aug 2, 2021
54e2022
Reduce-by-key performance analysis.
sleeepyjack Aug 4, 2021
cc853c3
Additional benchmark setups for static_reduction_map.
sleeepyjack Aug 4, 2021
28069d0
Make key_generator.hpp usable from other benchmark suites.
sleeepyjack Aug 5, 2021
26787ad
Fix for make_from_uninitialized_slots.
sleeepyjack Aug 5, 2021
e2a81b3
[WIP] Added benchmark for static_reduction_map in shared memory.
sleeepyjack Aug 5, 2021
9807d8f
Added definition for slot_type.
sleeepyjack Aug 6, 2021
0c1bd4d
Change visibility of get_slots() from protected to public. (Fix for d…
sleeepyjack Aug 6, 2021
58a2ead
Move test helpers to util.hpp.
sleeepyjack Aug 7, 2021
f4979b1
Added tests for custom_op and shared memory hash table.
sleeepyjack Aug 7, 2021
01e75bd
Added definition for slot_type.
sleeepyjack Aug 6, 2021
1e86659
Change visibility of get_slots() from protected to public. (Fix for d…
sleeepyjack Aug 6, 2021
21be2e1
Move test helpers to util.hpp.
sleeepyjack Aug 7, 2021
53bfe27
Added tests for custom_op and shared memory hash table.
sleeepyjack Aug 7, 2021
f4c703a
Added benchmark for static_reduction_map in shared memory.
sleeepyjack Aug 8, 2021
492b4cc
Merge branch 'smem-bench' into reduction-map
sleeepyjack Aug 8, 2021
ec76e8a
Add example for shared memory hash table.
sleeepyjack Aug 9, 2021
961e88b
Fix for static_reduction_map::contains.
sleeepyjack Aug 9, 2021
5931c9f
Extend parameter range for shared memory hash table benchmark.
sleeepyjack Aug 9, 2021
902b93a
Size computation using thrust::count_if. Asynchronous bulk operations.
sleeepyjack Aug 9, 2021
d440243
Add throughput column to nvbench benchmarks.
sleeepyjack Aug 10, 2021
4339b2b
Fix for reductions over FP types.
sleeepyjack Aug 24, 2021
6293117
Add support for both static/dynamic extent of device_view.
jrhemstad Oct 13, 2021
cddd733
Add predicated store to prevent optimization.
jrhemstad Oct 13, 2021
c43f7fb
Add multiplicity to benchmark.
jrhemstad Oct 14, 2021
8f965b1
Merge pull request #1 from jrhemstad/static-extent-reduction-map
sleeepyjack Oct 14, 2021
bef9aa6
Add appropriate host qualifier to device_view_base functions.
jrhemstad Oct 14, 2021
a99b56f
Import extent static member into device_mutable_view.
jrhemstad Oct 14, 2021
e87fc9d
Add static_assert for factory from static array.
jrhemstad Oct 14, 2021
607f79f
Merge branch 'dev' into reduction-map
sleeepyjack Feb 7, 2022
6d05ebb
Merge branch 'dev' into reduction-map
sleeepyjack Feb 7, 2022
5f24429
Minor fixes addressing reviewer comments
sleeepyjack Mar 21, 2022
c2e4e62
Move reduction operators to include/cuco/
sleeepyjack Mar 23, 2022
e1361a3
Added a tag to ensure that only valid reduction functors can be used
sleeepyjack Mar 23, 2022
d196de5
Move common kernels to a new file
PointKernel May 26, 2022
e904dca
Updates: incorporate new kernel header
PointKernel May 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 22 additions & 22 deletions .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ AlignTrailingComments: true
AllowAllArgumentsOnNextLine: true
AllowAllConstructorInitializersOnNextLine: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: true
AllowShortBlocksOnASingleLine: true
AllowShortCaseLabelsOnASingleLine: true
AllowShortEnumsOnASingleLine: true
AllowShortFunctionsOnASingleLine: All
Expand All @@ -27,21 +27,21 @@ AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: Yes
BinPackArguments: false
BinPackArguments: false
PointKernel marked this conversation as resolved.
Show resolved Hide resolved
BinPackParameters: false
BraceWrapping:
AfterClass: false
AfterClass: false
AfterControlStatement: false
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
AfterExternBlock: false
BeforeCatch: false
BeforeElse: false
IndentBraces: false
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
AfterExternBlock: false
BeforeCatch: false
BeforeElse: false
IndentBraces: false
# disabling the below splits, else, they'll just add to the vertical length of source files!
SplitEmptyFunction: false
SplitEmptyRecord: false
Expand All @@ -56,7 +56,7 @@ BreakConstructorInitializers: BeforeColon
BreakInheritanceList: BeforeColon
BreakStringLiterals: true
ColumnLimit: 100
CommentPragmas: '^ IWYU pragma:'
CommentPragmas: "^ IWYU pragma:"
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: true
# Kept the below 2 to be the same as `IndentWidth` to keep everything uniform
Expand All @@ -72,16 +72,16 @@ ForEachMacros:
- Q_FOREACH
- BOOST_FOREACH
IncludeBlocks: Preserve
IncludeIsMainRegex: '([-_](test|unittest))?$'
IncludeIsMainRegex: "([-_](test|unittest))?$"
IndentCaseLabels: true
IndentPPDirectives: None
IndentWidth: 2
IndentWidth: 2
IndentWrappedFunctionNames: false
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: false
MacroBlockBegin: ''
MacroBlockEnd: ''
MacroBlockBegin: ""
MacroBlockEnd: ""
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBinPackProtocolList: Never
Expand All @@ -105,9 +105,9 @@ RawStringFormats:
- cpp
- Cpp
- CPP
- 'c++'
- 'C++'
CanonicalDelimiter: ''
- "c++"
- "C++"
CanonicalDelimiter: ""
- Language: TextProto
Delimiters:
- pb
Expand All @@ -122,7 +122,7 @@ RawStringFormats:
- PARSE_TEXT_PROTO
- ParseTextOrDie
- ParseTextProtoOrDie
CanonicalDelimiter: ''
CanonicalDelimiter: ""
BasedOnStyle: google
# Enabling comment reflow causes doxygen comments to be messed up in their formats!
ReflowComments: true
Expand Down
41 changes: 25 additions & 16 deletions benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#=============================================================================
# Copyright (c) 2018-2021, NVIDIA CORPORATION.
# Copyright (c) 2018-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -39,11 +39,11 @@ CPMAddPackage(
###################################################################################################

###################################################################################################
function(ConfigureBench BENCH_NAME BENCH_SRC)
add_executable(${BENCH_NAME} "${BENCH_SRC}")
function(ConfigureBench BENCH_NAME)
add_executable(${BENCH_NAME} ${ARGN})
set_target_properties(${BENCH_NAME} PROPERTIES
POSITION_INDEPENDENT_CODE ON
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/gbenchmarks")
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/benchmarks")
target_include_directories(${BENCH_NAME} PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}")
target_compile_options(${BENCH_NAME} PRIVATE --expt-extended-lambda --expt-relaxed-constexpr -Xcompiler -Wno-subobject-linkage)
Expand All @@ -59,10 +59,10 @@ function(ConfigureNVBench BENCH_NAME)
add_executable(${BENCH_NAME} ${ARGN})
set_target_properties(${BENCH_NAME} PROPERTIES
POSITION_INDEPENDENT_CODE ON
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/nvbenchmarks")
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/benchmarks"
COMPILE_FLAGS -DNVBENCH_MODULE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the NVBENCH_MODULE definition?

Copy link
Collaborator Author

@sleeepyjack sleeepyjack Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to reuse the key_generator.hppfor both gbench and nvbench setups. See:

#if defined(NVBENCH_MODULE)
#include <nvbench/nvbench.cuh>

NVBENCH_DECLARE_ENUM_TYPE_STRINGS(
  // Enum type:
  dist_type,
  // Callable to generate input strings:
  // Short identifier used for tables, command-line args, etc.
  // Used when context is available to figure out the enum type.
  [](dist_type d) {
    switch (d) {
      case dist_type::GAUSSIAN: return "GAUSSIAN";
      case dist_type::GEOMETRIC: return "GEOMETRIC";
      case dist_type::UNIFORM: return "UNIFORM";
      case dist_type::UNIQUE: return "UNIQUE";
      case dist_type::SAME: return "SAME";
      default: return "ERROR";
    }
  },
  // Callable to generate descriptions:
  // If non-empty, these are used in `--list` to describe values.
  // Used when context may not be available to figure out the type from the
  // input string.
  // Just use `[](auto) { return std::string{}; }` if you don't want these.
  [](auto) { return std::string{}; })
#endif

here: https://github.com/sleeepyjack/cuCollections/blob/5f244292990dbde9d5311d28ede72e74803250ac/benchmarks/key_generator.hpp#L25

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another way of detecting if nvbench is included? I initially thought I could use the include guard definition but nvbench uses #pragma once iirc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. This is fine then. I'd suggest renaming to CUCO_USING_NVBENCH.

target_include_directories(${BENCH_NAME} PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}")
#"${NVBench_SOURCE_DIR}")
target_compile_options(${BENCH_NAME} PRIVATE --expt-extended-lambda --expt-relaxed-constexpr)
target_link_libraries(${BENCH_NAME} PRIVATE
nvbench::main
Expand All @@ -76,13 +76,27 @@ endfunction(ConfigureNVBench)

###################################################################################################
# - dynamic_map benchmarks ------------------------------------------------------------------------
set(DYNAMIC_MAP_BENCH_SRC "${CMAKE_CURRENT_SOURCE_DIR}/hash_table/dynamic_map_bench.cu")
ConfigureBench(DYNAMIC_MAP_BENCH "${DYNAMIC_MAP_BENCH_SRC}")
ConfigureBench(DYNAMIC_MAP_BENCH
hash_table/dynamic_map_bench.cu)

###################################################################################################
# - static_map benchmarks -------------------------------------------------------------------------
set(STATIC_MAP_BENCH_SRC "${CMAKE_CURRENT_SOURCE_DIR}/hash_table/static_map_bench.cu")
ConfigureBench(STATIC_MAP_BENCH "${STATIC_MAP_BENCH_SRC}")
ConfigureBench(STATIC_MAP_BENCH
hash_table/static_map_bench.cu)

###################################################################################################
# - static_reduction_map benchmarks ---------------------------------------------------------------
ConfigureNVBench(STATIC_REDUCTION_MAP_BENCH
hash_table/static_reduction_map/insert_bench.cu)
ConfigureNVBench(STATIC_REDUCTION_MAP_PARAM_SWEEP
hash_table/static_reduction_map/param_sweep.cu)

###################################################################################################
# - reduce-by-key benchmarks ----------------------------------------------------------------------
ConfigureNVBench(REDUCE_BY_KEY_BENCH
reduce_by_key/cuco_reduce_by_key_bench.cu
reduce_by_key/cub_reduce_by_key_bench.cu
reduce_by_key/thrust_reduce_by_key_bench.cu)

###################################################################################################
# - static_multimap benchmarks --------------------------------------------------------------------
Expand All @@ -94,9 +108,4 @@ ConfigureNVBench(STATIC_MULTIMAP_BENCH
hash_table/static_multimap/retrieve_bench.cu)

ConfigureNVBench(RETRIEVE_BENCH
hash_table/static_multimap/optimal_retrieve_bench.cu)

###################################################################################################
# - reduce_by_key benchmarks ----------------------------------------------------------------------
set(RBK_BENCH_SRC "${CMAKE_CURRENT_SOURCE_DIR}/reduce_by_key/reduce_by_key.cu")
ConfigureBench(RBK_BENCH "${RBK_BENCH_SRC}")
hash_table/static_multimap/optimal_retrieve_bench.cu)
716 changes: 716 additions & 0 deletions benchmarks/analysis/notebooks/rbk_bench.ipynb

Large diffs are not rendered by default.

38 changes: 5 additions & 33 deletions benchmarks/hash_table/dynamic_map_bench.cu
Original file line number Diff line number Diff line change
Expand Up @@ -17,39 +17,9 @@
#include <benchmark/benchmark.h>
#include <cuco/dynamic_map.cuh>
#include <iostream>
#include <random>
#include <key_generator.hpp>
#include <synchronization.hpp>

enum class dist_type { UNIQUE, UNIFORM, GAUSSIAN };

template <dist_type Dist, typename Key, typename OutputIt>
static void generate_keys(OutputIt output_begin, OutputIt output_end)
{
auto num_keys = std::distance(output_begin, output_end);

std::random_device rd;
std::mt19937 gen{rd()};

switch (Dist) {
case dist_type::UNIQUE:
for (auto i = 0; i < num_keys; ++i) {
output_begin[i] = i;
}
break;
case dist_type::UNIFORM:
for (auto i = 0; i < num_keys; ++i) {
output_begin[i] = std::abs(static_cast<Key>(gen()));
}
break;
case dist_type::GAUSSIAN:
std::normal_distribution<> dg{1e9, 1e7};
for (auto i = 0; i < num_keys; ++i) {
output_begin[i] = std::abs(static_cast<Key>(dg(gen)));
}
break;
}
}

static void gen_final_size(benchmark::internal::Benchmark* b)
{
for (auto size = 10'000'000; size <= 150'000'000; size += 20'000'000) {
Expand All @@ -64,11 +34,12 @@ static void BM_dynamic_insert(::benchmark::State& state)

std::size_t num_keys = state.range(0);
std::size_t initial_size = 1 << 27;
std::size_t multiplicity = 1;
sleeepyjack marked this conversation as resolved.
Show resolved Hide resolved

std::vector<Key> h_keys(num_keys);
std::vector<cuco::pair_type<Key, Value>> h_pairs(num_keys);

generate_keys<Dist, Key>(h_keys.begin(), h_keys.end());
generate_keys<Key>(Dist, h_keys.begin(), h_keys.end(), multiplicity);

for (auto i = 0; i < num_keys; ++i) {
Key key = h_keys[i];
Expand Down Expand Up @@ -101,11 +72,12 @@ static void BM_dynamic_search_all(::benchmark::State& state)

std::size_t num_keys = state.range(0);
std::size_t initial_size = 1 << 27;
std::size_t multiplicity = 1;

std::vector<Key> h_keys(num_keys);
std::vector<cuco::pair_type<Key, Value>> h_pairs(num_keys);

generate_keys<Dist, Key>(h_keys.begin(), h_keys.end());
generate_keys<Key>(Dist, h_keys.begin(), h_keys.end(), multiplicity);

for (auto i = 0; i < num_keys; ++i) {
Key key = h_keys[i];
Expand Down
50 changes: 11 additions & 39 deletions benchmarks/hash_table/static_map_bench.cu
Original file line number Diff line number Diff line change
Expand Up @@ -18,40 +18,10 @@
#include <benchmark/benchmark.h>
#include <fstream>
#include <iostream>
#include <random>
#include <key_generator.hpp>
#include <thrust/device_vector.h>
#include <thrust/for_each.h>

enum class dist_type { UNIQUE, UNIFORM, GAUSSIAN };

template <dist_type Dist, typename Key, typename OutputIt>
static void generate_keys(OutputIt output_begin, OutputIt output_end)
{
auto num_keys = std::distance(output_begin, output_end);

std::random_device rd;
std::mt19937 gen{rd()};

switch (Dist) {
case dist_type::UNIQUE:
for (auto i = 0; i < num_keys; ++i) {
output_begin[i] = i;
}
break;
case dist_type::UNIFORM:
for (auto i = 0; i < num_keys; ++i) {
output_begin[i] = std::abs(static_cast<Key>(gen()));
}
break;
case dist_type::GAUSSIAN:
std::normal_distribution<> dg{1e9, 1e7};
for (auto i = 0; i < num_keys; ++i) {
output_begin[i] = std::abs(static_cast<Key>(dg(gen)));
}
break;
}
}

/**
* @brief Generates input sizes and hash table occupancies
*
Expand All @@ -70,14 +40,15 @@ static void BM_static_map_insert(::benchmark::State& state)
{
using map_type = cuco::static_map<Key, Value>;

std::size_t num_keys = state.range(0);
float occupancy = state.range(1) / float{100};
std::size_t size = num_keys / occupancy;
std::size_t num_keys = state.range(0);
float occupancy = state.range(1) / float{100};
std::size_t size = num_keys / occupancy;
std::size_t multiplicity = 1;

std::vector<Key> h_keys(num_keys);
std::vector<cuco::pair_type<Key, Value>> h_pairs(num_keys);

generate_keys<Dist, Key>(h_keys.begin(), h_keys.end());
generate_keys<Key>(Dist, h_keys.begin(), h_keys.end(), multiplicity);

for (auto i = 0; i < num_keys; ++i) {
Key key = h_keys[i];
Expand Down Expand Up @@ -108,9 +79,10 @@ static void BM_static_map_search_all(::benchmark::State& state)
{
using map_type = cuco::static_map<Key, Value>;

std::size_t num_keys = state.range(0);
float occupancy = state.range(1) / float{100};
std::size_t size = num_keys / occupancy;
std::size_t num_keys = state.range(0);
float occupancy = state.range(1) / float{100};
std::size_t size = num_keys / occupancy;
std::size_t multiplicity = 1;

map_type map{size, -1, -1};
auto view = map.get_device_mutable_view();
Expand All @@ -120,7 +92,7 @@ static void BM_static_map_search_all(::benchmark::State& state)
std::vector<cuco::pair_type<Key, Value>> h_pairs(num_keys);
std::vector<Value> h_results(num_keys);

generate_keys<Dist, Key>(h_keys.begin(), h_keys.end());
generate_keys<Key>(Dist, h_keys.begin(), h_keys.end(), multiplicity);

for (auto i = 0; i < num_keys; ++i) {
Key key = h_keys[i];
Expand Down
Loading