Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram skeleton. #3502

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Histogram skeleton. #3502

wants to merge 1 commit into from

Conversation

prak-nv
Copy link
Contributor

@prak-nv prak-nv commented Nov 17, 2021

Signed-off-by: Piotr Rak [email protected]

Description

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (Redesign of existing code that doesn't affect functionality)
  • Other (e.g. Documentation, Tests, Configuration)

What happened in this PR

Additional information

  • Affected modules and functionalities:
  • Key points relevant for the review:

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@prak-nv prak-nv force-pushed the histogram_cpu branch 2 times, most recently from 29a20c2 to c3ccc6c Compare January 12, 2022 10:42
Comment on lines 27 to 45
int dim = ndims - hist_dim;

// Starting from inner dimension, look if we should reduce this axis
// If we can do so until we can collapse next dimention.
auto raxis = reduction_axes.rbegin();

for (; raxis != reduction_axes.rend(); ++raxis) {
// TODO: Handle multidimentional histograms correctly
if (dim != *raxis) {
break;
}
--dim;
}

// If there's no reduction axes left, we won't need transpose any axis.
if (raxis == reduction_axes.rend())
return true;

return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit overcomplicated - isn't it equivalent to the following?

Suggested change
int dim = ndims - hist_dim;
// Starting from inner dimension, look if we should reduce this axis
// If we can do so until we can collapse next dimention.
auto raxis = reduction_axes.rbegin();
for (; raxis != reduction_axes.rend(); ++raxis) {
// TODO: Handle multidimentional histograms correctly
if (dim != *raxis) {
break;
}
--dim;
}
// If there's no reduction axes left, we won't need transpose any axis.
if (raxis == reduction_axes.rend())
return true;
return false;
if (reduction_axes.size() != ndims - hist_dims)
return false; // not all dimensions reduced
for (int i = 0; i < ndims - hist_dims; i++)
if (reduction_axes[i] != i)
return false;
}
return true


SmallVector<int, 6> axes_order;
axes_order.reserve(ndims);
for (int axis : non_rediction_axes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int axis : non_rediction_axes) {
for (int axis : non_reduction_axes) {

const int ndims = input_shapes.sample_dim();

auto shape_span
auto non_rediction_axes = GetNonReductionAxes(ndims);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto non_rediction_axes = GetNonReductionAxes(ndims);
auto non_reduction_axes = GetNonReductionAxes(ndims);

However, how about using bit masks for reduced axes? This would greatly simplify the code. You'd create a mask once and then you'd just check if given axis is marked as reduced or not. We even have a bitmask class, so there's no need to manually shift/mask stuff.

return false;
}

// Collapses all inner dimensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we also collapse all outer dimensions?
After rearranging the axes so that we have
non-reduced reduced [channel]
We can then collapse (or expand!) each of these groups, so that we have exactly one non-reduced dimension, one reduced dimension and one channel dimension.
If it happens, that one of these groups is empty, we can always insert a phony axis with extent 1.

int hist_dim = 1) {
int dim = ndims - hist_dim;

DALI_ENFORCE(dim >= 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think simple queries like that should throw. Passing hist_dim < 1 constitutes a coding error. Since this function is internal, assert should be enough.

@NVIDIA NVIDIA deleted a comment from dali-automaton Jan 27, 2022
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3840090]: BUILD STARTED

@prak-nv prak-nv force-pushed the histogram_cpu branch 2 times, most recently from a399a60 to d9c477f Compare February 1, 2022 12:07

TensorListShape<ret_ndim> result(nshapes, dyn_out_ndim);

if (out_ndim == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (out_ndim == 0) {
if (dyn_out_ndim == 0) {


const int dyn_out_ndim = shape.size() - 1;

int nshapes = shape[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int nshapes = shape[0];
int outer_extent = shape[0];

or

Suggested change
int nshapes = shape[0];
int nsamples = shape[0];

}

void HistogramCPU::PrepareInputShapesForTranspose(const TensorListShape<> &input_shapes,
const SmallVector<int, 6> &non_reduction_axes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to work with a bit mask or a set.

@prak-nv prak-nv force-pushed the histogram_cpu branch 9 times, most recently from 8ca05b9 to d221251 Compare February 17, 2022 21:26
@prak-nv prak-nv force-pushed the histogram_cpu branch 2 times, most recently from 2ee5046 to 606fafb Compare March 8, 2022 15:25
~HistogramCPU() override = default;

private:
int VerifyRangeArguments(const workspace_t<CPUBackend> &ws, int num_samples);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar Nazi: Verify -> Validate

auto lo_view = view<const float>(ranges_lo);
auto hi_view = view<const float>(ranges_hi);

int hist_dim = ranges_lo.num_samples() / num_samples;
Copy link
Contributor

@mzient mzient Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be better to enforce exact shape (scalar when there are is no channel dimension and [num_channels] otherwise).

int HistogramCPU::VerifyNonUniformRangeArguments(const workspace_t<CPUBackend> &ws, int num_samples) {
assert(!uniform_);

DALI_ENFORCE(ws.NumInput() % 2 == 1, "Should have both ranges"); // FIXME message
Copy link
Contributor

@mzient mzient Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waaait... what?

@JanuszL JanuszL mentioned this pull request Mar 30, 2022
Signed-off-by: Piotr Rak <[email protected]>
@@ -0,0 +1,18 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be precise:

Suggested change
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.

or

Suggested change
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019, 2022, NVIDIA CORPORATION. All rights reserved.

# Get all the source files and dump test files
collect_headers(DALI_INST_HDRS PARENT_SCOPE)
collect_sources(DALI_OPERATOR_SRCS PARENT_SCOPE)
collect_test_sources(DALI_OPERATOR_TEST_SRCS PARENT_SCOPE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: missing empty line at the end

@@ -0,0 +1,102 @@
// Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

@@ -0,0 +1,60 @@
// Copyright (c) 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// Copyright (c) 2020-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

@@ -993,7 +993,7 @@ inline T OpSchema::GetDefaultValueForArgument(const std::string &s) const {
}

#define DALI_SCHEMA_REG(OpName) \
int DALI_OPERATOR_SCHEMA_REQUIRED_FOR_##OpName() { \
int CONCAT_2(DALI_OPERATOR_SCHEMA_REQUIRED_FOR_, OpName)() { \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note. In general, I think it's best to keep unrelated changes separate (separate small PR)

Comment on lines +67 to +72
template <>
struct CVMatType<uint8_t> {
static int get(int nchannel) noexcept {
return CV_MAKETYPE(CV_8U, nchannel);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct seems unnecessary. How about:

template <typename Ty_>
int CVMatType(int nchannel) {
    return CV_MAKETYPE(CV_8U, nchannel);
}

Comment on lines +103 to +112
std::vector<Type *> tmp_pointers;
tmp_pointers.reserve(transposed_shapes.num_samples());

for (int i = 0; i < transposed_shapes.num_samples(); ++i) {
auto tmp = scratch.template AllocTensor<mm::memory_kind::host, Type>(transposed_shapes[i]);
tmp_pointers.push_back(tmp.data);
}

TensorListView<StorageCPU, Type> transpose_out_view(std::move(tmp_pointers),
std::move(transposed_shapes));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<Type *> tmp_pointers;
tmp_pointers.reserve(transposed_shapes.num_samples());
for (int i = 0; i < transposed_shapes.num_samples(); ++i) {
auto tmp = scratch.template AllocTensor<mm::memory_kind::host, Type>(transposed_shapes[i]);
tmp_pointers.push_back(tmp.data);
}
TensorListView<StorageCPU, Type> transpose_out_view(std::move(tmp_pointers),
std::move(transposed_shapes));
auto transpose_out_view = scratch.template AllocTensorList<mm::memory_kind::host, Type>(std::move(transposed_shapes));
}

Comment on lines +95 to +98
template <typename Type, typename ScratchAlloc, typename Coll>
TensorListView<StorageCPU, const Type> transpose_view(
dali::ThreadPool &thread_pool, ScratchAlloc &scratch,
const TensorListView<StorageCPU, const Type> &in_view, const Coll &transpose_axes_order) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename Type, typename ScratchAlloc, typename Coll>
TensorListView<StorageCPU, const Type> transpose_view(
dali::ThreadPool &thread_pool, ScratchAlloc &scratch,
const TensorListView<StorageCPU, const Type> &in_view, const Coll &transpose_axes_order) {
template <typename Type>
TensorListView<StorageCPU, const Type> transpose_view(
dali::ThreadPool &thread_pool, Scratchpad &scratch,
const TensorListView<StorageCPU, const Type> &in_view, span<const int> transpose_axes_order) {

Scratchpad is an interface. Also, I think we could use span for the axes

for (int i = 0; i < transpose_out_view.num_samples(); ++i) {
thread_pool.AddWork([&, i](int thread_id) {
auto perm = make_span(transpose_axes_order);
kernels::Transpose(transpose_out_view[i], in_view[i], perm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe kernels::TransposeGrouped as it can automatically simplify the shapes when applicable?

});
}
thread_pool.RunAll(true);
return reinterpret<const Type>(transpose_out_view, transpose_out_view.shape);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return reinterpret<const Type>(transpose_out_view, transpose_out_view.shape);
return transpose_out_view;

would do, I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the conversion to const view is implicit. Reinterpret is a heavy hammer - like with a cast, it will hide any possible errors until run-time.

using namespace dali;
using namespace dali::hist_detail;

#define id_(x) x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite unnecessary.

Comment on lines +52 to +54
auto range_view = view<const float>(dim_ranges);
for (int i = 0; i < range_view[sample].num_elements(); ++i) {
ranges.push_back(range_view.tensor_data(sample)[i]);
Copy link
Contributor

@mzient mzient Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend hoisting the sample access to the outer loop.
Also, num_elements() is not trivial and the compiler might fail to identify it as a loop invariant.

Suggested change
auto range_view = view<const float>(dim_ranges);
for (int i = 0; i < range_view[sample].num_elements(); ++i) {
ranges.push_back(range_view.tensor_data(sample)[i]);
auto range_view = view<const float>(dim_ranges)[sample];
for (int i = 0, n = range_view.num_elements(); i < n; ++i) {
ranges.push_back(range_view.data[i]);

Comment on lines +60 to +65
template <typename Ty_>
struct CVMatType {
static int get(int) {
DALI_ENFORCE(false, "Unreachable - invalid type");
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about getting rid of the specializations and going with:

Suggested change
template <typename Ty_>
struct CVMatType {
static int get(int) {
DALI_ENFORCE(false, "Unreachable - invalid type");
}
};
template <typename ChannelType>
struct CVMatType {
static int get(int nchannels) {
return CV_MAKETYPE(cv::DataDepth<ChannelType>>::type, nchannels);
}
};

?

Also, the trailing underscore in template arguments is not a coding style we use.

if (is_identity_) {
auto out_view_id = view<Type>(output);
run_identity<Type>(thread_pool, in_view, out_view_id);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is slightly off

Comment on lines +567 to +571
std::vector<cv::Mat> images = {
cv::Mat(1, in_sizes.data(), in_type, splited_in_views[i].data)};

cv::InputArray input_mat(images);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need an cv::InputArray. You can pass cv::Mat to cv::calcHist

Comment on lines +592 to +593
TensorView<StorageCPU, float> hist_view(hist_data, splited_out_views[i].shape);
kernels::copy(splited_out_views[i], hist_view);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is off in those two lines

ret.resize(num_samples, hist_dim_);

for (int i = 0; i < num_samples; ++i) {
TensorShape<> bin_shape{make_span(batch_bins_[i].data(), hist_dim_)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TensorShape<> bin_shape{make_span(batch_bins_[i].data(), hist_dim_)};
TensorShape<> bin_shape{make_span(batch_bins_[i])};

Comment on lines +365 to +367
split_mapping_ = std::move(split_mapping);
splited_input_shapes_ = std::move(splited_input_shapes);
splited_output_shapes_ = std::move(splited_output_shapes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not operating directly on those? Avoiding unnecessary allocations

.AddParent("HistogramBase");

DALI_SCHEMA(UniformHistogramOpName)
.DocStr(R"code(Calculates 1D or ND histogram of the input tensor with uniform histogram bin ranges.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.DocStr(R"code(Calculates 1D or ND histogram of the input tensor with uniform histogram bin ranges.
.DocStr(R"code(Calculates 1D or ND histogram of with uniform histogram bin ranges.

Comment on lines +658 to +662
Calculates histogram of of uniform bin ranges, second input tensor specifies lower bound of range of values and third argument specfies
upper range of values in each histogram dimension.

For example lower range (2nd tensor argument) ``[0]`` and upper range (3rd tensor argument) ``[255]`` and ``num_bins=[16]`` will calculate histogram
with 16 bins of uniformly subdivided in range ``[0, 255]``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Calculates histogram of of uniform bin ranges, second input tensor specifies lower bound of range of values and third argument specfies
upper range of values in each histogram dimension.
For example lower range (2nd tensor argument) ``[0]`` and upper range (3rd tensor argument) ``[255]`` and ``num_bins=[16]`` will calculate histogram
with 16 bins of uniformly subdivided in range ``[0, 255]``
This operators calculates ``num_bins`` uniform ranges covering the range defined by the lower bound and upper bound defined by positional inputs.
For example, a lower range (2nd positional input) ``[0]`` and an upper range (3rd positional input) ``[255]``, together with a ``num_bins=16`` will calculate a histogram with 16 uniform bins in the range ``[0, 255]``.

Comment on lines +666 to +673
The histogram of the input tensor when ``channel_axis`` or ``channel_axis_name`` is specified is calculated as multidimensional
histogram ie. as if histogram would be calculated for each seperate channel of this axis.
If channel axis is not specified 1D histogram is calculated.
Current implentation supports up to 32 channels for histogram calculation.

Histogram calculation supports specifing arbitrary axes of reduction.

For example for tensor with layout "HWC" one could calculate different single and multidimentional histograms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted some comments in the other occurrence of this text. I think those comments apply here as well

bool IsIdentityTransform() const {
return is_identity_;
}
bool IsSimpleReduction1() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

std::vector<std::vector<float>> batch_ranges_;
std::vector<SmallVector<int, 3>> batch_bins_;
ArgValue<int, 1> param_num_bins_;
kernels::ScratchpadAllocator transpose_mem_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kernels::ScratchpadAllocator transpose_mem_;
kernels::DynamicScratchpad transpose_mem_;

for ret_sz, expected_sz in zip(out.shape(), sz.as_array()):
assert(ret_sz == tuple(expected_sz))

def test_uniform_hist_args():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of running several tests in one function, I'd suggest to create independent functions for each test. This way when a testcase fails, we can see exactly which one.

}

split_mapping_ = std::move(split_mapping);
splited_input_shapes_ = std::move(splited_input_shapes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Split" is an irregular verb. split/split/split.

Suggested change
splited_input_shapes_ = std::move(splited_input_shapes);
split_input_shapes_ = std::move(split_input_shapes);

@mzient mzient removed their assignment Nov 10, 2022
@jantonguirao jantonguirao removed their assignment Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants