Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Subgraph Elimination #5752

Merged
merged 11 commits into from
Dec 18, 2024
Merged

Common Subgraph Elimination #5752

merged 11 commits into from
Dec 18, 2024

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Dec 16, 2024

Category:

New feature (non-breaking change which adds functionality)

Description:

This change introduces a graph-level optimization that removes common subgraphs.
The algorithm goes over graph nodes in topological order and computes a key from the operator's schema name, arguments and (renamed) inputs. If the operator is found, it's replaced with the previously found one and the output names are added to the renaming map. When subsequent operators are processed, the inputs of the replaced operator are renamed accordingly.
Example:

A = foo(arg=42)  # new
B = foo(arg=42)  # replaced with A
C = bar(A) # new
D = bar(B) # replaced with bar(A)
return C + D  # replaced with C + C

This optimization is particularly useful with constants, the use of .shape function and subscripts.
In the following example:

horz_center = (bbox[0] + bbox[2]) / 2
width = (bbox[2]  - bbox[0])

The bbox[0] and bbox[2] will be reused.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Some Python tests needed to be adjusted.

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-4159

mzient and others added 6 commits December 16, 2024 14:49
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
@@ -41,7 +41,8 @@ inline std::shared_ptr<Argument> DeserializeProtobufVectorImpl(const DaliProtoPr
auto args = arg.extra_args();
std::vector<T> ret_val;
for (auto& a : args) {
const T& elem = DeserializeProtobuf(a)->Get<T>();
Copy link
Contributor Author

@mzient mzient Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing Argument::Get to returning a reference, this became a bug: returning a reference to a field of the result of DeserializeProtobuf - since we get a temporary shared_ptr, it would be destroyed (along with the referenced field!) before the field reference could be used.

Comment on lines +585 to +587
ref = pipe_out_cpu
else:
check_batch(pipe_out_cpu, ref)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test checked too little and took too long - now it's more accurate and much faster.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [21516079]: BUILD STARTED

- restore _module and _display name in the default schema
- add preserve(_name) in Pipeline tests where CSE is unwanted
- fix and extend Python tests
- extend documentation

Signed-off-by: Michał Zientkiewicz <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [21520256]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [21520256]: BUILD PASSED

Signed-off-by: Michal Zientkiewicz <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [21574171]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [21574171]: BUILD PASSED

class CSE {
public:
void Run(OpGraph &graph) {
for (auto &node : graph.OpNodes())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpNodes returns the nodes in topological order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the graph is sorted by the Builder.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [21610648]: BUILD STARTED

bool per_frame = false; //< The (tensor) argument can be expanded to multiple frames
bool internal = false; //< The argument cannot be set by the user in Python
bool hidden = false; //< The argument doesn't appear in the documentation
bool ignore_cmp = false; //< Two operators can be considered equal if this argument differs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick

Suggested change
bool ignore_cmp = false; //< Two operators can be considered equal if this argument differs
bool ignore_cmp = false; //< Two operators can be considered equal even if this argument differs

Comment on lines +100 to +102
AddInternalArg("preserve_name", R"(When true, the operator cannot be renamed.
This disables merging this operator with another one with a different name.)",
false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be the case always when an operator was explicitly named by the user?

I understand that having two equal ops (Readers?) but with different names is not a clever way to use DALI, but it is possible for the user to do that. Let's say that for whatever reason I have two readers doing the same work, but named differently ("Reader1", "Reader2"). I wrapped the pipeline into something (an iterator?) that relies on a presence of "Reader2" in the pipeline. This is not a good code, but it works. And after CSE it doesn't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly how it works. If you specify a name manually, then preserve_name is set to True.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see:

self._spec.AddArg("preserve_name", not self._autoname)

Thank you!

@@ -2333,3 +2332,61 @@ def def_ref():
(ref,) = ref_pipe.run()
check_batch(cpu, ref, bs, 0, 0, "HWC")
check_batch(gpu, ref, bs, 0, 0, "HWC")


def test_cse():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a test with External Source (to make sure it's not eliminated) and manually named operators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually named operator: see line 2343.
ES - we already have tests with multiple external sources and they did fail before adding the fix, so I'd call it covered.

@mzient mzient merged commit 785b2ed into NVIDIA:main Dec 18, 2024
5 of 6 checks passed
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [21610648]: BUILD PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants