-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram skeleton. #3502
base: main
Are you sure you want to change the base?
Histogram skeleton. #3502
Conversation
29a20c2
to
c3ccc6c
Compare
int dim = ndims - hist_dim; | ||
|
||
// Starting from inner dimension, look if we should reduce this axis | ||
// If we can do so until we can collapse next dimention. | ||
auto raxis = reduction_axes.rbegin(); | ||
|
||
for (; raxis != reduction_axes.rend(); ++raxis) { | ||
// TODO: Handle multidimentional histograms correctly | ||
if (dim != *raxis) { | ||
break; | ||
} | ||
--dim; | ||
} | ||
|
||
// If there's no reduction axes left, we won't need transpose any axis. | ||
if (raxis == reduction_axes.rend()) | ||
return true; | ||
|
||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit overcomplicated - isn't it equivalent to the following?
int dim = ndims - hist_dim; | |
// Starting from inner dimension, look if we should reduce this axis | |
// If we can do so until we can collapse next dimention. | |
auto raxis = reduction_axes.rbegin(); | |
for (; raxis != reduction_axes.rend(); ++raxis) { | |
// TODO: Handle multidimentional histograms correctly | |
if (dim != *raxis) { | |
break; | |
} | |
--dim; | |
} | |
// If there's no reduction axes left, we won't need transpose any axis. | |
if (raxis == reduction_axes.rend()) | |
return true; | |
return false; | |
if (reduction_axes.size() != ndims - hist_dims) | |
return false; // not all dimensions reduced | |
for (int i = 0; i < ndims - hist_dims; i++) | |
if (reduction_axes[i] != i) | |
return false; | |
} | |
return true |
|
||
SmallVector<int, 6> axes_order; | ||
axes_order.reserve(ndims); | ||
for (int axis : non_rediction_axes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int axis : non_rediction_axes) { | |
for (int axis : non_reduction_axes) { |
const int ndims = input_shapes.sample_dim(); | ||
|
||
auto shape_span | ||
auto non_rediction_axes = GetNonReductionAxes(ndims); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto non_rediction_axes = GetNonReductionAxes(ndims); | |
auto non_reduction_axes = GetNonReductionAxes(ndims); |
However, how about using bit masks for reduced axes? This would greatly simplify the code. You'd create a mask once and then you'd just check if given axis is marked as reduced or not. We even have a bitmask class, so there's no need to manually shift/mask stuff.
return false; | ||
} | ||
|
||
// Collapses all inner dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we also collapse all outer dimensions?
After rearranging the axes so that we have
non-reduced reduced [channel]
We can then collapse (or expand!) each of these groups, so that we have exactly one non-reduced dimension, one reduced dimension and one channel dimension.
If it happens, that one of these groups is empty, we can always insert a phony axis with extent 1.
c3ccc6c
to
efcc9e3
Compare
int hist_dim = 1) { | ||
int dim = ndims - hist_dim; | ||
|
||
DALI_ENFORCE(dim >= 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think simple queries like that should throw. Passing hist_dim < 1
constitutes a coding error. Since this function is internal, assert
should be enough.
efcc9e3
to
f7d86d1
Compare
CI MESSAGE: [3840090]: BUILD STARTED |
a399a60
to
d9c477f
Compare
include/dali/core/tensor_shape.h
Outdated
|
||
TensorListShape<ret_ndim> result(nshapes, dyn_out_ndim); | ||
|
||
if (out_ndim == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (out_ndim == 0) { | |
if (dyn_out_ndim == 0) { |
include/dali/core/tensor_shape.h
Outdated
|
||
const int dyn_out_ndim = shape.size() - 1; | ||
|
||
int nshapes = shape[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int nshapes = shape[0]; | |
int outer_extent = shape[0]; |
or
int nshapes = shape[0]; | |
int nsamples = shape[0]; |
} | ||
|
||
void HistogramCPU::PrepareInputShapesForTranspose(const TensorListShape<> &input_shapes, | ||
const SmallVector<int, 6> &non_reduction_axes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easier to work with a bit mask or a set.
8ca05b9
to
d221251
Compare
2ee5046
to
606fafb
Compare
~HistogramCPU() override = default; | ||
|
||
private: | ||
int VerifyRangeArguments(const workspace_t<CPUBackend> &ws, int num_samples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar Nazi: Verify -> Validate
auto lo_view = view<const float>(ranges_lo); | ||
auto hi_view = view<const float>(ranges_hi); | ||
|
||
int hist_dim = ranges_lo.num_samples() / num_samples; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be better to enforce exact shape (scalar when there are is no channel dimension and [num_channels]
otherwise).
int HistogramCPU::VerifyNonUniformRangeArguments(const workspace_t<CPUBackend> &ws, int num_samples) { | ||
assert(!uniform_); | ||
|
||
DALI_ENFORCE(ws.NumInput() % 2 == 1, "Should have both ranges"); // FIXME message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waaait... what?
Signed-off-by: Piotr Rak <[email protected]>
@@ -0,0 +1,18 @@ | |||
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be precise:
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved. |
or
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
# Copyright (c) 2019, 2022, NVIDIA CORPORATION. All rights reserved. |
# Get all the source files and dump test files | ||
collect_headers(DALI_INST_HDRS PARENT_SCOPE) | ||
collect_sources(DALI_OPERATOR_SRCS PARENT_SCOPE) | ||
collect_test_sources(DALI_OPERATOR_TEST_SRCS PARENT_SCOPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: missing empty line at the end
@@ -0,0 +1,102 @@ | |||
// Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
@@ -0,0 +1,60 @@ | |||
// Copyright (c) 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (c) 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
// Copyright (c) 2020-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
@@ -993,7 +993,7 @@ inline T OpSchema::GetDefaultValueForArgument(const std::string &s) const { | |||
} | |||
|
|||
#define DALI_SCHEMA_REG(OpName) \ | |||
int DALI_OPERATOR_SCHEMA_REQUIRED_FOR_##OpName() { \ | |||
int CONCAT_2(DALI_OPERATOR_SCHEMA_REQUIRED_FOR_, OpName)() { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note. In general, I think it's best to keep unrelated changes separate (separate small PR)
template <> | ||
struct CVMatType<uint8_t> { | ||
static int get(int nchannel) noexcept { | ||
return CV_MAKETYPE(CV_8U, nchannel); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct seems unnecessary. How about:
template <typename Ty_>
int CVMatType(int nchannel) {
return CV_MAKETYPE(CV_8U, nchannel);
}
std::vector<Type *> tmp_pointers; | ||
tmp_pointers.reserve(transposed_shapes.num_samples()); | ||
|
||
for (int i = 0; i < transposed_shapes.num_samples(); ++i) { | ||
auto tmp = scratch.template AllocTensor<mm::memory_kind::host, Type>(transposed_shapes[i]); | ||
tmp_pointers.push_back(tmp.data); | ||
} | ||
|
||
TensorListView<StorageCPU, Type> transpose_out_view(std::move(tmp_pointers), | ||
std::move(transposed_shapes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<Type *> tmp_pointers; | |
tmp_pointers.reserve(transposed_shapes.num_samples()); | |
for (int i = 0; i < transposed_shapes.num_samples(); ++i) { | |
auto tmp = scratch.template AllocTensor<mm::memory_kind::host, Type>(transposed_shapes[i]); | |
tmp_pointers.push_back(tmp.data); | |
} | |
TensorListView<StorageCPU, Type> transpose_out_view(std::move(tmp_pointers), | |
std::move(transposed_shapes)); | |
auto transpose_out_view = scratch.template AllocTensorList<mm::memory_kind::host, Type>(std::move(transposed_shapes)); | |
} |
template <typename Type, typename ScratchAlloc, typename Coll> | ||
TensorListView<StorageCPU, const Type> transpose_view( | ||
dali::ThreadPool &thread_pool, ScratchAlloc &scratch, | ||
const TensorListView<StorageCPU, const Type> &in_view, const Coll &transpose_axes_order) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <typename Type, typename ScratchAlloc, typename Coll> | |
TensorListView<StorageCPU, const Type> transpose_view( | |
dali::ThreadPool &thread_pool, ScratchAlloc &scratch, | |
const TensorListView<StorageCPU, const Type> &in_view, const Coll &transpose_axes_order) { | |
template <typename Type> | |
TensorListView<StorageCPU, const Type> transpose_view( | |
dali::ThreadPool &thread_pool, Scratchpad &scratch, | |
const TensorListView<StorageCPU, const Type> &in_view, span<const int> transpose_axes_order) { |
Scratchpad is an interface. Also, I think we could use span for the axes
for (int i = 0; i < transpose_out_view.num_samples(); ++i) { | ||
thread_pool.AddWork([&, i](int thread_id) { | ||
auto perm = make_span(transpose_axes_order); | ||
kernels::Transpose(transpose_out_view[i], in_view[i], perm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe kernels::TransposeGrouped
as it can automatically simplify the shapes when applicable?
}); | ||
} | ||
thread_pool.RunAll(true); | ||
return reinterpret<const Type>(transpose_out_view, transpose_out_view.shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return reinterpret<const Type>(transpose_out_view, transpose_out_view.shape); | |
return transpose_out_view; |
would do, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the conversion to const view is implicit. Reinterpret is a heavy hammer - like with a cast, it will hide any possible errors until run-time.
using namespace dali; | ||
using namespace dali::hist_detail; | ||
|
||
#define id_(x) x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's quite unnecessary.
auto range_view = view<const float>(dim_ranges); | ||
for (int i = 0; i < range_view[sample].num_elements(); ++i) { | ||
ranges.push_back(range_view.tensor_data(sample)[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend hoisting the sample access to the outer loop.
Also, num_elements() is not trivial and the compiler might fail to identify it as a loop invariant.
auto range_view = view<const float>(dim_ranges); | |
for (int i = 0; i < range_view[sample].num_elements(); ++i) { | |
ranges.push_back(range_view.tensor_data(sample)[i]); | |
auto range_view = view<const float>(dim_ranges)[sample]; | |
for (int i = 0, n = range_view.num_elements(); i < n; ++i) { | |
ranges.push_back(range_view.data[i]); |
template <typename Ty_> | ||
struct CVMatType { | ||
static int get(int) { | ||
DALI_ENFORCE(false, "Unreachable - invalid type"); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about getting rid of the specializations and going with:
template <typename Ty_> | |
struct CVMatType { | |
static int get(int) { | |
DALI_ENFORCE(false, "Unreachable - invalid type"); | |
} | |
}; | |
template <typename ChannelType> | |
struct CVMatType { | |
static int get(int nchannels) { | |
return CV_MAKETYPE(cv::DataDepth<ChannelType>>::type, nchannels); | |
} | |
}; |
?
Also, the trailing underscore in template arguments is not a coding style we use.
if (is_identity_) { | ||
auto out_view_id = view<Type>(output); | ||
run_identity<Type>(thread_pool, in_view, out_view_id); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation is slightly off
std::vector<cv::Mat> images = { | ||
cv::Mat(1, in_sizes.data(), in_type, splited_in_views[i].data)}; | ||
|
||
cv::InputArray input_mat(images); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need an cv::InputArray
. You can pass cv::Mat
to cv::calcHist
TensorView<StorageCPU, float> hist_view(hist_data, splited_out_views[i].shape); | ||
kernels::copy(splited_out_views[i], hist_view); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is off in those two lines
ret.resize(num_samples, hist_dim_); | ||
|
||
for (int i = 0; i < num_samples; ++i) { | ||
TensorShape<> bin_shape{make_span(batch_bins_[i].data(), hist_dim_)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TensorShape<> bin_shape{make_span(batch_bins_[i].data(), hist_dim_)}; | |
TensorShape<> bin_shape{make_span(batch_bins_[i])}; |
split_mapping_ = std::move(split_mapping); | ||
splited_input_shapes_ = std::move(splited_input_shapes); | ||
splited_output_shapes_ = std::move(splited_output_shapes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not operating directly on those? Avoiding unnecessary allocations
.AddParent("HistogramBase"); | ||
|
||
DALI_SCHEMA(UniformHistogramOpName) | ||
.DocStr(R"code(Calculates 1D or ND histogram of the input tensor with uniform histogram bin ranges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.DocStr(R"code(Calculates 1D or ND histogram of the input tensor with uniform histogram bin ranges. | |
.DocStr(R"code(Calculates 1D or ND histogram of with uniform histogram bin ranges. |
Calculates histogram of of uniform bin ranges, second input tensor specifies lower bound of range of values and third argument specfies | ||
upper range of values in each histogram dimension. | ||
|
||
For example lower range (2nd tensor argument) ``[0]`` and upper range (3rd tensor argument) ``[255]`` and ``num_bins=[16]`` will calculate histogram | ||
with 16 bins of uniformly subdivided in range ``[0, 255]`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculates histogram of of uniform bin ranges, second input tensor specifies lower bound of range of values and third argument specfies | |
upper range of values in each histogram dimension. | |
For example lower range (2nd tensor argument) ``[0]`` and upper range (3rd tensor argument) ``[255]`` and ``num_bins=[16]`` will calculate histogram | |
with 16 bins of uniformly subdivided in range ``[0, 255]`` | |
This operators calculates ``num_bins`` uniform ranges covering the range defined by the lower bound and upper bound defined by positional inputs. | |
For example, a lower range (2nd positional input) ``[0]`` and an upper range (3rd positional input) ``[255]``, together with a ``num_bins=16`` will calculate a histogram with 16 uniform bins in the range ``[0, 255]``. |
The histogram of the input tensor when ``channel_axis`` or ``channel_axis_name`` is specified is calculated as multidimensional | ||
histogram ie. as if histogram would be calculated for each seperate channel of this axis. | ||
If channel axis is not specified 1D histogram is calculated. | ||
Current implentation supports up to 32 channels for histogram calculation. | ||
|
||
Histogram calculation supports specifing arbitrary axes of reduction. | ||
|
||
For example for tensor with layout "HWC" one could calculate different single and multidimentional histograms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted some comments in the other occurrence of this text. I think those comments apply here as well
bool IsIdentityTransform() const { | ||
return is_identity_; | ||
} | ||
bool IsSimpleReduction1() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this used?
std::vector<std::vector<float>> batch_ranges_; | ||
std::vector<SmallVector<int, 3>> batch_bins_; | ||
ArgValue<int, 1> param_num_bins_; | ||
kernels::ScratchpadAllocator transpose_mem_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kernels::ScratchpadAllocator transpose_mem_; | |
kernels::DynamicScratchpad transpose_mem_; |
for ret_sz, expected_sz in zip(out.shape(), sz.as_array()): | ||
assert(ret_sz == tuple(expected_sz)) | ||
|
||
def test_uniform_hist_args(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of running several tests in one function, I'd suggest to create independent functions for each test. This way when a testcase fails, we can see exactly which one.
} | ||
|
||
split_mapping_ = std::move(split_mapping); | ||
splited_input_shapes_ = std::move(splited_input_shapes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Split" is an irregular verb. split/split/split.
splited_input_shapes_ = std::move(splited_input_shapes); | |
split_input_shapes_ = std::move(split_input_shapes); |
Signed-off-by: Piotr Rak [email protected]
Description
What happened in this PR
Additional information
Checklist
Tests
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A