-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per frame affine transforms #3946
Conversation
72643bc
to
e26dd87
Compare
!build |
CI MESSAGE: [5033445]: BUILD STARTED |
CI MESSAGE: [5033445]: BUILD FAILED |
!build |
CI MESSAGE: [5033491]: BUILD STARTED |
CI MESSAGE: [5033491]: BUILD FAILED |
CI MESSAGE: [5033491]: BUILD PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly small nitpicks and complaints to test, I don't see any significant issues in the operator code, in fact it is really nice that we can add this sequence/per-frame support via expansion with so little changes.
@@ -361,32 +358,50 @@ class SequenceOperator : public Operator<Backend>, protected SampleBroadcasting< | |||
void SetupSequenceOperator(const workspace_t<Backend> &ws) { | |||
auto num_inputs = ws.NumInput(); | |||
input_expand_desc_.clear(); | |||
input_expand_desc_.reserve(num_inputs); | |||
input_expand_desc_.reserve(num_inputs + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of the input_expand_desc_
having one more element, why can't we directly use the expand_like_
as the source of truth desc? Especially if it won't be used some times (the additional element, it can be error prone).
Maybe make it testable as a boolean, so we know that if it's empty we don't expand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @klecki about not adding the last element to input_expand_desc_
. How about making expand_like_
a unique_ptr that InferReferenceExpandDesc
returns. a nullptr would mean nothing to expand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like the unique pointer idea.
source=dummy_source(input_data), layout=input_layout) | ||
if device == "gpu": | ||
input = input.gpu() | ||
def pipeline(input_data: ArgData, args_data: List[ArgData]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between input_data
and args_data
if args_data
can have positional inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the same type, maybe those can be concatenated outside of the pipeline and we would have half of the ifs here removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it can. Thanks for catching it that. Somewhere along the refactoring I already tried it but there were some issues with specifying the device, but now it was no problem and it simplified the function indeed,
return [expand_arg(input_layout, num_expand, arg_has_frames, input_batch, arg_batch) | ||
for input_batch, arg_batch in zip(input_data, arg_data)] | ||
|
||
def expand_arg_input(input_data: ArgData, arg_data: ArgData): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it replicating the arg_data
if it's not per-frame in manner corresponding to the input data? I would not mind some small docstring here or there :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added the info about it in a docstr.
if expanded_axis is not None: | ||
output_size_params += (expanded_axis.data,) | ||
output_sizes = [ | ||
sequence_batch_output_size(*args) | ||
for args in zip(*output_size_params)] | ||
expanded_params.append(ArgData(ArgDesc("size", False, "cpu"), output_sizes)) | ||
expanded_params.append(ArgData(ArgDesc("size", "", "cpu"), output_sizes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion, maybe use names of the arguments in ArgDesc?
expanded_params.append(ArgData(ArgDesc("size", "", "cpu"), output_sizes)) | |
expanded_params.append(ArgData( | |
ArgDesc(name="size", expandable_prefix="", dest_device="cpu"), | |
output_sizes) | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
class TransformsParamsProvider(ParamsProvider): | ||
def unfold_output_layout(self, layout): | ||
unfolded = super().unfold_output_layout(layout) | ||
if unfolded == "**": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we get "**"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transform ops propagate the information about sequence layout to reduce the need for fn.per_frame in the intermediate steps when composing multiple transforms. So if you specify args per-frame, the output matrices have F** layout. The base unfold_output_layout unfolds the outputs (of the layout F**, ) and drops the prefix of the layout (F** -> **). Usually it is OK, because it means something like FHWC -> HWC, but "**" is meaningless and baseline transform op run with no per frame input produces outputs with no layout.
] | ||
|
||
seq_cases = test_cases + only_with_seq_input_cases | ||
yield from sequence_suite_helper(rng, "F", [("F**", mt_seq_input)], seq_cases, num_iters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess we just remove F
in Python test to generate the baseline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
# transform the test cases to test the transforms with per-frame args but: | ||
# 1. with the positional input that does not contain frames | ||
# 2. without the positional input | ||
for tested_fn, fixed_params, params_provider, devices in test_cases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, just skimmed this.
@@ -176,3 +178,61 @@ def _test_empty_input(device): | |||
def test_empty_input(): | |||
for device in ["cpu", "gpu"]: | |||
yield _test_empty_input, device | |||
|
|||
|
|||
def test_sequences(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I didn't read this yet.
auto expand_like_idx = GetReferenceInputIdx(); | ||
assert(expand_like_idx == -1 || GetInputExpandDesc(expand_like_idx).NumDimsToExpand() > 0); | ||
return expand_like_idx >= 0; | ||
assert(expand_like_ == nullptr || expand_like_->NumDimsToExpand() > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain the reasoning behind this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It stipulates that input used as a reference for expansion must be sequence-like. Just to avoid repeated code of the form
- GetReferentialExpandDesc();
- Make sure it is an actual sequence, i.e. there are some dims to expand.
return input_idx; | ||
const ExpandDesc *InferNamedReferenceExpandDesc(const workspace_t<Backend> &ws) { | ||
for (const auto &arg_input : ws) { | ||
auto &shared_tvec = arg_input.second.tvec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto &shared_tvec = arg_input.second.tvec; | |
auto *shared_tvec = arg_input.second.tvec; |
would read better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a shared_ptr. It is taken by reference to avoid unnecessary counter increment.
@@ -361,32 +358,50 @@ class SequenceOperator : public Operator<Backend>, protected SampleBroadcasting< | |||
void SetupSequenceOperator(const workspace_t<Backend> &ws) { | |||
auto num_inputs = ws.NumInput(); | |||
input_expand_desc_.clear(); | |||
input_expand_desc_.reserve(num_inputs); | |||
input_expand_desc_.reserve(num_inputs + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @klecki about not adding the last element to input_expand_desc_
. How about making expand_like_
a unique_ptr that InferReferenceExpandDesc
returns. a nullptr would mean nothing to expand.
b123d27
to
579a2df
Compare
!build |
CI MESSAGE: [5114301]: BUILD STARTED |
CI MESSAGE: [5114301]: BUILD PASSED |
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
…of truth of the sequence shape Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
Signed-off-by: Kamil Tokarski <[email protected]>
3e503de
to
754d1d0
Compare
!build |
CI MESSAGE: [5129150]: BUILD STARTED |
CI MESSAGE: [5129150]: BUILD PASSED |
(fn.transforms.shear, {}, TransformsParamsProvider( | ||
[ArgCb("angles", shear_angles, True)]), ["cpu"]), | ||
(fn.transforms.shear, {}, TransformsParamsProvider( | ||
[ArgCb("shear", shift, True)]), ["cpu"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about testing arguments like shift
as a proper scalar value (not only as argument input that is per frame or not per frame)?
Probably not important, the baseline implementation after unfolding should handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a scalar case (and a few more per-frame non-per frame compose cases.
Signed-off-by: Kamil Tokarski <[email protected]>
…lity Signed-off-by: Kamil Tokarski <[email protected]>
!build |
CI MESSAGE: [5132897]: BUILD STARTED |
CI MESSAGE: [5132897]: BUILD PASSED |
Category:
Description:
Adds support for sequence input to affine transforms and coordinate transforms ops. This should facilitate use cases of the
warp_affine
operator with video. For the same reason, thecoord_transform
operator is extended to support sequences.The change required modification of the SequenceOperator so that the named arguments may be treated as a source of truth as to whether the operator processes sequences. This helps in two ways. Firstly, operator may have no positional input but should produce per-frame output, for example
transforms.rotation
may receive per-frame angles as a named argument and should produce per-frame rotation matrices based on that. Secondly,transforms.rotation
may receive input matrices on per-sample basis (for instance from Constant op) and then compose them with per-frame matrices produced based on per-frame angles.Corresponding changes was made in the sequences testing utility. Previously the utility expected sequence-containing batches for the 0-th input of the operator (while the rest of the params was specified as callbacks that the utility used to produce inputs for arguments that matches the shape of the input). Now, you can specify to which argument the provided batches should be fed, effectively making the arbitrary argument "a source of truth" as to the shape of the input sequences.
Additional information:
Affected modules and functionalities:
The functionalities of following operators are affected:
The
SequenceOperator
is modified to handle transforms. It is already utilized by gaussian blur, laplacian, rotate and warp operators but the change should not affect their functionalities.It duplicates the fix for the tests to pass.#3958
This PR was rebased on that one.
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-2795