Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Laplacian CPU kernel #3518

Merged
merged 6 commits into from
Dec 8, 2021
Merged

Add Laplacian CPU kernel #3518

merged 6 commits into from
Dec 8, 2021

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented Nov 19, 2021

Signed-off-by: Kamil Tokarski [email protected]

Add Laplacian CPU kernel

Description

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (Redesign of existing code that doesn't affect functionality)
  • Other (e.g. Documentation, Tests, Configuration)

What happened in this PR

PR adds laplacian cpu kernel along with a few gtest tests.
It boils down to running a few separable convolutions and summing the results - specializations purpose is to first allocate (or use output buffer if applicable) an intermediate buffer for accumulating the results and to pass appropriate transforms to the convolution so that the convolutions results are accumulated in the same pass as the convolution computation.

Additional information

  • Key points relevant for the review:

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-2471

@stiepan
Copy link
Member Author

stiepan commented Nov 19, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3438316]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3438316]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3438316]: BUILD PASSED

@stiepan stiepan force-pushed the laplacian_cpu_kernel branch from dea2303 to 7a05d36 Compare November 23, 2021 10:53
@stiepan stiepan marked this pull request as ready for review November 23, 2021 10:55
@stiepan
Copy link
Member Author

stiepan commented Nov 23, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3459044]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3459044]: BUILD PASSED

@stiepan stiepan force-pushed the laplacian_cpu_kernel branch from 71a0bd6 to 354c550 Compare November 24, 2021 11:19
@stiepan
Copy link
Member Author

stiepan commented Nov 24, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3467428]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3467428]: BUILD PASSED

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the axis order is reversed in Laplacian compared to what we do in the Convolutions.

Some nitpicks.

Also, I think that the convolution changes should go as a separate PR (I could quickly approve the convolution + tests changes), and the laplacian + laplacian test (I didn't review the test yet) should go to a separate one.

namespace conv_transform {

/**
* @brief Transforms enable postprocessing of values computed by 1D convolution before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick, but this docstring is only for the TransScaleSat, you may add a group surrounding the classes below.

Copy link
Member Author

@stiepan stiepan Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that in a separate PR.
#3535

out_ptr[offset] = ConvertSat<Out>(val * scale);
}

float scale;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume, that the default constructor due to the default argument initializes it to 1.0f?
Maybe we should just slap a

Suggested change
float scale;
float scale = 1.f;

here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it may be the case that somebody wants to specify different scaling factor, right? In that case this constructor is still needed.

*/
template <typename Out, typename In, typename W, int axes, int deriv_axis,
bool has_channels = false, typename T = conv_transform::TransScaleSat<Out, W>>
struct Convolution {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, as this is mainly used for the calculation of derivatives, we should name it a bit different than just Convolution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to PartialDeriv

struct Convolution {
using MultiDimConv = SeparableConvolutionCpu<Out, In, W, axes, has_channels, T>;
static constexpr int ndim = MultiDimConv::ndim;
using SingleDimConv = ConvolutionCpu<Out, In, W, ndim, deriv_axis, has_channels, T>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of DxKernel we placed deriv_axis=0, but the convolution assumes that x in HW layout is the last one, so I would expect 1 to be used here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reversed the x, y, z naming of the sub kernels.

sub_ctx.scratchpad = &sub_scratch;

// Clear the scratchpad for sub-kernels to reuse memory
sobel_dx_.Run(sub_ctx, acc, in, windows[0], scale[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sobel_dx_.Run(sub_ctx, acc, in, windows[0], scale[0]);
sobel_dx_.Run(sub_ctx, acc, in, windows[0], {scale[0]});

Shouldn't we create a transform here? How does it work with a scale? Is it implicit conversion or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we pass 2 scales (axes = 2) and only use the one here. Is it intended?

Same in the 3D case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see it's packed into the separate transform.

bool has_channels>
struct LaplacianCPUBase<T, Intermediate, Out, In, W, 2, has_channels> {
static constexpr int axes = 2;
using DxKernel = Convolution<Intermediate, In, W, axes, 0, has_channels,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, the convolution in 2D case, assumes the HW[C] layout, so the x is typically the second (index 1) data axis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reversed the x, y, z naming of the sub kernels.

void Run(KernelContext& ctx, const TensorView<StorageCPU, Out, ndim> &out,
const TensorView<StorageCPU, Intermediate, ndim> &acc,
const TensorView<StorageCPU, const In, ndim>& in,
const std::array<std::array<TensorView<StorageCPU, const W, 1>, axes>, axes>& windows,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that we might need some docs about the nesting of windows at least :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a brief docstr to the LaplacianCPU declaration that describes the ordering used in windows-related arguments.

std::array<float, window_size> w = {0.};
w[0] = 1.;
for (int i = 1; i < window_size - d_order; i++) {
auto prevval = w[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken indentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine now.

Comment on lines +79 to +109
LaplacianWindows() {
for (int i = 0; i < axes; i++) {
for (int j = 0; j < axes; j++) {
if (i == j) {
window_sizes[i][j] = window_size;
windows[i][j] = GetSobelWindow<window_size>(2);
tensor_windows[i][j] = {windows[i][j].data(), window_size};
} else if (use_smoothing) {
window_sizes[i][j] = window_size;
windows[i][j] = GetSobelWindow<window_size>(0);
tensor_windows[i][j] = {windows[i][j].data(), window_size};
} else {
window_sizes[i][j] = 1;
windows[i][j] = uniform_array<window_size>(0.f);
auto middle = window_size / 2;
windows[i][j][middle] = 1.f;
tensor_windows[i][j] = {windows[i][j].data() + middle, 1};
}
}
}
}
std::array<std::array<int, axes>, axes> window_sizes;
std::array<std::array<std::array<float, window_size>, axes>, axes> windows;
std::array<std::array<TensorView<StorageCPU, const float, 1>, axes>, axes> tensor_windows;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I am lost about which level of nesting corresponds to what.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It goes the same way as in LaplacianCPU.Run. To recap:
tensor_windows[i] describes windows used to compute the i-th partial derivative (i.e. the one that approximates the second order partial derivative along i-th dimension when counting them from the left to the right). So tensor_windows[i][i] is a window that should look like alike [1, -2, 1], whereas for j <> i, tensor_windows[i][j] is some kind of a smoothing window.

Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
@stiepan stiepan force-pushed the laplacian_cpu_kernel branch from 354c550 to 5e2cf9a Compare November 28, 2021 17:25
@stiepan
Copy link
Member Author

stiepan commented Nov 28, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3487879]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3487879]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3487879]: BUILD PASSED

Comment on lines +40 to +41
* window of size 1 must be equal to `[1]`, this way, if window sizes in non-derivative directions
* are one, the smoothing convolutions can be skipped and only a single one-dimensional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* window of size 1 must be equal to `[1]`, this way, if window sizes in non-derivative directions
* are one, the smoothing convolutions can be skipped and only a single one-dimensional
* window of size 1 must be equal to `[1]`. This way, if window sizes in non-derivative directions
* are one, the smoothing convolutions can be skipped and only a single one-dimensional

And I feel there's something wrong with the latter sentence:

This way, if window sizes in non-derivative directions are one [...]

Should it be is one? Or maybe is [1]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 3D case there are in fact two smoothing window sizes for each partial derivative. This optimization handles the case where all the smoothing window sizes are 1, hence "are".


namespace laplacian {

using namespace conv_transform; // NOLINT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this here? I think I saw only a few references to this namespaces, maybe it would be cleaner to do it explicitly?

Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
@stiepan
Copy link
Member Author

stiepan commented Dec 7, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3544355]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3544355]: BUILD PASSED

@stiepan stiepan merged commit 7ea3dfb into NVIDIA:main Dec 8, 2021
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Feb 21, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
@JanuszL JanuszL mentioned this pull request Mar 30, 2022
cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
* Add laplacian CPU kernel

Signed-off-by: Kamil Tokarski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants