-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Laplacian CPU kernel #3518
Add Laplacian CPU kernel #3518
Conversation
!build |
CI MESSAGE: [3438316]: BUILD STARTED |
CI MESSAGE: [3438316]: BUILD FAILED |
CI MESSAGE: [3438316]: BUILD PASSED |
dea2303
to
7a05d36
Compare
!build |
CI MESSAGE: [3459044]: BUILD STARTED |
CI MESSAGE: [3459044]: BUILD PASSED |
71a0bd6
to
354c550
Compare
!build |
CI MESSAGE: [3467428]: BUILD STARTED |
CI MESSAGE: [3467428]: BUILD PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the axis order is reversed in Laplacian compared to what we do in the Convolutions.
Some nitpicks.
Also, I think that the convolution changes should go as a separate PR (I could quickly approve the convolution + tests changes), and the laplacian + laplacian test (I didn't review the test yet) should go to a separate one.
namespace conv_transform { | ||
|
||
/** | ||
* @brief Transforms enable postprocessing of values computed by 1D convolution before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nitpick, but this docstring is only for the TransScaleSat, you may add a group surrounding the classes below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed that in a separate PR.
#3535
out_ptr[offset] = ConvertSat<Out>(val * scale); | ||
} | ||
|
||
float scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume, that the default constructor due to the default argument initializes it to 1.0f?
Maybe we should just slap a
float scale; | |
float scale = 1.f; |
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it may be the case that somebody wants to specify different scaling factor, right? In that case this constructor is still needed.
*/ | ||
template <typename Out, typename In, typename W, int axes, int deriv_axis, | ||
bool has_channels = false, typename T = conv_transform::TransScaleSat<Out, W>> | ||
struct Convolution { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, as this is mainly used for the calculation of derivatives, we should name it a bit different than just Convolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to PartialDeriv
struct Convolution { | ||
using MultiDimConv = SeparableConvolutionCpu<Out, In, W, axes, has_channels, T>; | ||
static constexpr int ndim = MultiDimConv::ndim; | ||
using SingleDimConv = ConvolutionCpu<Out, In, W, ndim, deriv_axis, has_channels, T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of DxKernel we placed deriv_axis=0
, but the convolution assumes that x in HW layout is the last one, so I would expect 1 to be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reversed the x, y, z naming of the sub kernels.
sub_ctx.scratchpad = &sub_scratch; | ||
|
||
// Clear the scratchpad for sub-kernels to reuse memory | ||
sobel_dx_.Run(sub_ctx, acc, in, windows[0], scale[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sobel_dx_.Run(sub_ctx, acc, in, windows[0], scale[0]); | |
sobel_dx_.Run(sub_ctx, acc, in, windows[0], {scale[0]}); |
Shouldn't we create a transform here? How does it work with a scale? Is it implicit conversion or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we pass 2 scales (axes = 2) and only use the one here. Is it intended?
Same in the 3D case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see it's packed into the separate transform.
bool has_channels> | ||
struct LaplacianCPUBase<T, Intermediate, Out, In, W, 2, has_channels> { | ||
static constexpr int axes = 2; | ||
using DxKernel = Convolution<Intermediate, In, W, axes, 0, has_channels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, the convolution in 2D case, assumes the HW[C] layout, so the x is typically the second (index 1) data axis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reversed the x, y, z naming of the sub kernels.
void Run(KernelContext& ctx, const TensorView<StorageCPU, Out, ndim> &out, | ||
const TensorView<StorageCPU, Intermediate, ndim> &acc, | ||
const TensorView<StorageCPU, const In, ndim>& in, | ||
const std::array<std::array<TensorView<StorageCPU, const W, 1>, axes>, axes>& windows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that we might need some docs about the nesting of windows at least :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a brief docstr to the LaplacianCPU declaration that describes the ordering used in windows-related arguments.
std::array<float, window_size> w = {0.}; | ||
w[0] = 1.; | ||
for (int i = 1; i < window_size - d_order; i++) { | ||
auto prevval = w[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine now.
LaplacianWindows() { | ||
for (int i = 0; i < axes; i++) { | ||
for (int j = 0; j < axes; j++) { | ||
if (i == j) { | ||
window_sizes[i][j] = window_size; | ||
windows[i][j] = GetSobelWindow<window_size>(2); | ||
tensor_windows[i][j] = {windows[i][j].data(), window_size}; | ||
} else if (use_smoothing) { | ||
window_sizes[i][j] = window_size; | ||
windows[i][j] = GetSobelWindow<window_size>(0); | ||
tensor_windows[i][j] = {windows[i][j].data(), window_size}; | ||
} else { | ||
window_sizes[i][j] = 1; | ||
windows[i][j] = uniform_array<window_size>(0.f); | ||
auto middle = window_size / 2; | ||
windows[i][j][middle] = 1.f; | ||
tensor_windows[i][j] = {windows[i][j].data() + middle, 1}; | ||
} | ||
} | ||
} | ||
} | ||
std::array<std::array<int, axes>, axes> window_sizes; | ||
std::array<std::array<std::array<float, window_size>, axes>, axes> windows; | ||
std::array<std::array<TensorView<StorageCPU, const float, 1>, axes>, axes> tensor_windows; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I am lost about which level of nesting corresponds to what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It goes the same way as in LaplacianCPU.Run. To recap:
tensor_windows[i]
describes windows used to compute the i-th
partial derivative (i.e. the one that approximates the second order partial derivative along i-th
dimension when counting them from the left to the right). So tensor_windows[i][i]
is a window that should look like alike [1, -2, 1], whereas for j <> i, tensor_windows[i][j]
is some kind of a smoothing window.
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
354c550
to
5e2cf9a
Compare
!build |
CI MESSAGE: [3487879]: BUILD STARTED |
CI MESSAGE: [3487879]: BUILD FAILED |
CI MESSAGE: [3487879]: BUILD PASSED |
* window of size 1 must be equal to `[1]`, this way, if window sizes in non-derivative directions | ||
* are one, the smoothing convolutions can be skipped and only a single one-dimensional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* window of size 1 must be equal to `[1]`, this way, if window sizes in non-derivative directions | |
* are one, the smoothing convolutions can be skipped and only a single one-dimensional | |
* window of size 1 must be equal to `[1]`. This way, if window sizes in non-derivative directions | |
* are one, the smoothing convolutions can be skipped and only a single one-dimensional |
And I feel there's something wrong with the latter sentence:
This way, if window sizes in non-derivative directions are one [...]
Should it be is one
? Or maybe is [1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 3D case there are in fact two smoothing window sizes for each partial derivative. This optimization handles the case where all the smoothing window sizes are 1, hence "are".
|
||
namespace laplacian { | ||
|
||
using namespace conv_transform; // NOLINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this here? I think I saw only a few references to this namespaces, maybe it would be cleaner to do it explicitly?
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
!build |
CI MESSAGE: [3544355]: BUILD STARTED |
CI MESSAGE: [3544355]: BUILD PASSED |
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
* Add laplacian CPU kernel Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski [email protected]
Add Laplacian CPU kernel
Description
What happened in this PR
PR adds laplacian cpu kernel along with a few gtest tests.
It boils down to running a few separable convolutions and summing the results - specializations purpose is to first allocate (or use output buffer if applicable) an intermediate buffer for accumulating the results and to pass appropriate transforms to the convolution so that the convolutions results are accumulated in the same pass as the convolution computation.
Additional information
Checklist
Tests
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-2471