Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per frame affine transforms #3946

Merged
merged 16 commits into from
Jun 21, 2022
Merged

Per frame affine transforms #3946

merged 16 commits into from
Jun 21, 2022

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented May 31, 2022

Category:

Description:

  • Adds support for sequence input to affine transforms and coordinate transforms ops. This should facilitate use cases of the warp_affine operator with video. For the same reason, the coord_transform operator is extended to support sequences.

  • The change required modification of the SequenceOperator so that the named arguments may be treated as a source of truth as to whether the operator processes sequences. This helps in two ways. Firstly, operator may have no positional input but should produce per-frame output, for example transforms.rotation may receive per-frame angles as a named argument and should produce per-frame rotation matrices based on that. Secondly, transforms.rotation may receive input matrices on per-sample basis (for instance from Constant op) and then compose them with per-frame matrices produced based on per-frame angles.

  • Corresponding changes was made in the sequences testing utility. Previously the utility expected sequence-containing batches for the 0-th input of the operator (while the rest of the params was specified as callbacks that the utility used to produce inputs for arguments that matches the shape of the input). Now, you can specify to which argument the provided batches should be fed, effectively making the arbitrary argument "a source of truth" as to the shape of the input sequences.

Additional information:

Affected modules and functionalities:

The functionalities of following operators are affected:

  • coord_transform
  • transforms.combine
  • transforms.crop
  • transforms.rotation
  • transforms.scale
  • transforms.shear
  • transforms.translation

The SequenceOperator is modified to handle transforms. It is already utilized by gaussian blur, laplacian, rotate and warp operators but the change should not affect their functionalities.

It duplicates the fix for the tests to pass.
#3958
This PR was rebased on that one.

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-2795

@stiepan stiepan force-pushed the per_frame_transforms branch from 72643bc to e26dd87 Compare June 6, 2022 22:15
@stiepan stiepan marked this pull request as ready for review June 7, 2022 11:38
@stiepan
Copy link
Member Author

stiepan commented Jun 7, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5033445]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5033445]: BUILD FAILED

@stiepan
Copy link
Member Author

stiepan commented Jun 7, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5033491]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5033491]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5033491]: BUILD PASSED

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly small nitpicks and complaints to test, I don't see any significant issues in the operator code, in fact it is really nice that we can add this sequence/per-frame support via expansion with so little changes.

@@ -361,32 +358,50 @@ class SequenceOperator : public Operator<Backend>, protected SampleBroadcasting<
void SetupSequenceOperator(const workspace_t<Backend> &ws) {
auto num_inputs = ws.NumInput();
input_expand_desc_.clear();
input_expand_desc_.reserve(num_inputs);
input_expand_desc_.reserve(num_inputs + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the input_expand_desc_ having one more element, why can't we directly use the expand_like_ as the source of truth desc? Especially if it won't be used some times (the additional element, it can be error prone).
Maybe make it testable as a boolean, so we know that if it's empty we don't expand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @klecki about not adding the last element to input_expand_desc_. How about making expand_like_ a unique_ptr that InferReferenceExpandDesc returns. a nullptr would mean nothing to expand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like the unique pointer idea.

source=dummy_source(input_data), layout=input_layout)
if device == "gpu":
input = input.gpu()
def pipeline(input_data: ArgData, args_data: List[ArgData]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between input_data and args_data if args_data can have positional inputs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same type, maybe those can be concatenated outside of the pipeline and we would have half of the ifs here removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it can. Thanks for catching it that. Somewhere along the refactoring I already tried it but there were some issues with specifying the device, but now it was no problem and it simplified the function indeed,

return [expand_arg(input_layout, num_expand, arg_has_frames, input_batch, arg_batch)
for input_batch, arg_batch in zip(input_data, arg_data)]

def expand_arg_input(input_data: ArgData, arg_data: ArgData):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it replicating the arg_data if it's not per-frame in manner corresponding to the input data? I would not mind some small docstring here or there :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added the info about it in a docstr.

if expanded_axis is not None:
output_size_params += (expanded_axis.data,)
output_sizes = [
sequence_batch_output_size(*args)
for args in zip(*output_size_params)]
expanded_params.append(ArgData(ArgDesc("size", False, "cpu"), output_sizes))
expanded_params.append(ArgData(ArgDesc("size", "", "cpu"), output_sizes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion, maybe use names of the arguments in ArgDesc?

Suggested change
expanded_params.append(ArgData(ArgDesc("size", "", "cpu"), output_sizes))
expanded_params.append(ArgData(
ArgDesc(name="size", expandable_prefix="", dest_device="cpu"),
output_sizes)
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

class TransformsParamsProvider(ParamsProvider):
def unfold_output_layout(self, layout):
unfolded = super().unfold_output_layout(layout)
if unfolded == "**":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we get "**"?

Copy link
Member Author

@stiepan stiepan Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transform ops propagate the information about sequence layout to reduce the need for fn.per_frame in the intermediate steps when composing multiple transforms. So if you specify args per-frame, the output matrices have F** layout. The base unfold_output_layout unfolds the outputs (of the layout F**, ) and drops the prefix of the layout (F** -> **). Usually it is OK, because it means something like FHWC -> HWC, but "**" is meaningless and baseline transform op run with no per frame input produces outputs with no layout.

]

seq_cases = test_cases + only_with_seq_input_cases
yield from sequence_suite_helper(rng, "F", [("F**", mt_seq_input)], seq_cases, num_iters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess we just remove F in Python test to generate the baseline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

# transform the test cases to test the transforms with per-frame args but:
# 1. with the positional input that does not contain frames
# 2. without the positional input
for tested_fn, fixed_params, params_provider, devices in test_cases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, just skimmed this.

@@ -176,3 +178,61 @@ def _test_empty_input(device):
def test_empty_input():
for device in ["cpu", "gpu"]:
yield _test_empty_input, device


def test_sequences():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I didn't read this yet.

auto expand_like_idx = GetReferenceInputIdx();
assert(expand_like_idx == -1 || GetInputExpandDesc(expand_like_idx).NumDimsToExpand() > 0);
return expand_like_idx >= 0;
assert(expand_like_ == nullptr || expand_like_->NumDimsToExpand() > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain the reasoning behind this assert?

Copy link
Member Author

@stiepan stiepan Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It stipulates that input used as a reference for expansion must be sequence-like. Just to avoid repeated code of the form

  1. GetReferentialExpandDesc();
  2. Make sure it is an actual sequence, i.e. there are some dims to expand.

return input_idx;
const ExpandDesc *InferNamedReferenceExpandDesc(const workspace_t<Backend> &ws) {
for (const auto &arg_input : ws) {
auto &shared_tvec = arg_input.second.tvec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto &shared_tvec = arg_input.second.tvec;
auto *shared_tvec = arg_input.second.tvec;

would read better

Copy link
Member Author

@stiepan stiepan Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a shared_ptr. It is taken by reference to avoid unnecessary counter increment.

@@ -361,32 +358,50 @@ class SequenceOperator : public Operator<Backend>, protected SampleBroadcasting<
void SetupSequenceOperator(const workspace_t<Backend> &ws) {
auto num_inputs = ws.NumInput();
input_expand_desc_.clear();
input_expand_desc_.reserve(num_inputs);
input_expand_desc_.reserve(num_inputs + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @klecki about not adding the last element to input_expand_desc_. How about making expand_like_ a unique_ptr that InferReferenceExpandDesc returns. a nullptr would mean nothing to expand.

@stiepan stiepan force-pushed the per_frame_transforms branch from b123d27 to 579a2df Compare June 17, 2022 15:21
@stiepan
Copy link
Member Author

stiepan commented Jun 17, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5114301]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5114301]: BUILD PASSED

@stiepan stiepan force-pushed the per_frame_transforms branch from 3e503de to 754d1d0 Compare June 20, 2022 09:43
@stiepan
Copy link
Member Author

stiepan commented Jun 20, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5129150]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5129150]: BUILD PASSED

(fn.transforms.shear, {}, TransformsParamsProvider(
[ArgCb("angles", shear_angles, True)]), ["cpu"]),
(fn.transforms.shear, {}, TransformsParamsProvider(
[ArgCb("shear", shift, True)]), ["cpu"]),
Copy link
Contributor

@klecki klecki Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about testing arguments like shift as a proper scalar value (not only as argument input that is per frame or not per frame)?

Probably not important, the baseline implementation after unfolding should handle it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a scalar case (and a few more per-frame non-per frame compose cases.

@stiepan
Copy link
Member Author

stiepan commented Jun 20, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5132897]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5132897]: BUILD PASSED

@stiepan stiepan merged commit d3ecce5 into NVIDIA:main Jun 21, 2022
@JanuszL JanuszL mentioned this pull request Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants