compiler: Misc improvements to code generation #2516

FabioLuporini · 2025-01-14T10:33:07Z

In essence, this is a batch of tweaks to support GPU features in PRO

FabioLuporini · 2025-01-14T10:34:10Z

devito/ir/clusters/cluster.py

@@ -540,19 +544,3 @@ def reduce_properties(clusters):
            properties[d] = normalize_properties(properties.get(d, v), v)

    return Properties(properties)
-
-
-def tailor_properties(properties, ispace):


note for reviewers: finally moved to ir/support/properties as promised in an old PR

codecov · 2025-01-14T10:51:47Z

Codecov Report

Attention: Patch coverage is 76.92308% with 45 lines in your changes missing coverage. Please review.

Project coverage is 87.30%. Comparing base (f71764a) to head (b8de9ec).

Files with missing lines	Patch %	Lines
devito/arch/archinfo.py	42.42%	19 Missing ⚠️
devito/ir/support/properties.py	76.74%	6 Missing and 4 partials ⚠️
devito/passes/clusters/misc.py	78.57%	5 Missing and 4 partials ⚠️
devito/arch/compiler.py	42.85%	4 Missing ⚠️
tests/test_mpi.py	60.00%	2 Missing ⚠️
devito/types/dense.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2516      +/-   ##
==========================================
- Coverage   87.31%   87.30%   -0.02%     
==========================================
  Files         238      238              
  Lines       45847    45972     +125     
  Branches     4060     4074      +14     
==========================================
+ Hits        40033    40134     +101     
- Misses       5129     5150      +21     
- Partials      685      688       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

devito/arch/archinfo.py

devito/arch/compiler.py

georgebisbas · 2025-01-14T12:38:45Z

devito/passes/clusters/misc.py

+
+            # Process the `weak` part of the key
+            for i in reversed(range(len(k.weak) + 1)):
+                choosable = [e for e in candidates if m[e].weak[:i] == k.weak[:i]]


I dont like this var name but ok

the choosable among the candidates -- I think it's OK! why don't you like it? Im open to alternatives!

hm, not sure....maybe like filtered, valid, eligible ?

mloubout

Some minor comments but looks straightforward to me.

devito/passes/clusters/misc.py

devito/types/basic.py

devito/arch/archinfo.py

mloubout · 2025-01-14T15:20:45Z

tests/test_mpi.py

@@ -21,7 +21,10 @@
 from devito.tools import Bunch

 from examples.seismic.acoustic import acoustic_setup
-from tests.test_dse import TestTTI
+try:
+    from tests.test_dse import TestTTI


why not just from .test_dse import TestTTI. If this file is run, test_dse exists there should not be any case where this cannot be imported here

Actually just from test_dse import TestTTI see e.g.

devito/tests/test_autotuner.py

Line 153 in da2c9a4

from test_dse import TestTTI

EdCaunt · 2025-01-20T10:53:40Z

devito/arch/archinfo.py

+            # Asynchronous pipeline loads -- introduced in Ampere
+            return True
+        elif query == 'tma' and cc >= 90:
+            # Tensor Memory Acceleratory -- introduced in Hopper


Typo: "acceleratory"

I think I'll postpone it until the next PR unless I'm really convinced a further push is required by this one

never mind. I had to push. Fixed

EdCaunt · 2025-01-20T11:08:20Z

devito/arch/archinfo.py

+            return False
+
+        cc = get_nvidia_cc()
+        if query == 'async-loads' and cc >= 80:


Why are the checks for CUDA compiler version not in the Ampere and Hopper classes?

Suggestion: why not something along the lines of

class NvidiaDevice(Device): _supports = () _mincuda = None def supports(self, query, language=None): if language != 'cuda': return False cc = get_nvidia_cc() return query in _supports and (_mincuda is None or cc >= _mincuda) class Volta(NvidiaDevice): pass class Ampere(Volta): _supports = super()._supports + ('async-loads',) _mincuda = 80 class Hopper(Ampere): _supports = super()._supports + ('tma',) _mincuda = 90 class Blackwell(Hopper): pass

Whilst this could probably be made more elegant, it does reduce the number of lines of code and reduce code duplication.

Maybe this idea is too unsophisticated though, since one could conceivably have high-end hardware but outdated compilers

you cannot use super() on class-level modules (attributes & methods)

and yes, you may have older / newer CUDA versions on different architectures

I suggest we refrain from over-engineering this until we'll truly need to do so (if ever, hopefully never) but I will consider push backs

EdCaunt · 2025-01-20T11:13:07Z

devito/finite_differences/differentiable.py

@@ -730,6 +730,12 @@ def __init_finalize__(self, *args, **kwargs):

        super().__init_finalize__(*args, **kwargs)

+    @classmethod
+    def class_key(cls):
+        # Ensure Weights appear before any other AbstractFunction


Why is this needed? Genuinely curious

sympy voodoo, sigh

to avoid that e.g. CUDA unvectorized code looks like a[i]*b[i] and like ab[i].y*ab[i].x once vectorized (ie, with operands flipped). In short, for consistency of the order of arithmetic operations

something along these lines

FabioLuporini requested review from mloubout, georgebisbas, EdCaunt and JDBetteridge January 14, 2025 10:33

FabioLuporini commented Jan 14, 2025

View reviewed changes

mloubout added the compiler label Jan 14, 2025

georgebisbas reviewed Jan 14, 2025

View reviewed changes

mloubout approved these changes Jan 14, 2025

View reviewed changes

devito/passes/clusters/misc.py Show resolved Hide resolved

devito/passes/clusters/misc.py Outdated Show resolved Hide resolved

devito/types/basic.py Show resolved Hide resolved

devito/arch/archinfo.py Outdated Show resolved Hide resolved

mloubout reviewed Jan 14, 2025

View reviewed changes

FabioLuporini force-pushed the async-loads-final-2 branch from 1c11eae to 72f4701 Compare January 14, 2025 15:24

mloubout approved these changes Jan 14, 2025

View reviewed changes

mloubout force-pushed the async-loads-final-2 branch from 72f4701 to 82fa700 Compare January 14, 2025 19:30

georgebisbas approved these changes Jan 15, 2025

View reviewed changes

EdCaunt reviewed Jan 20, 2025

View reviewed changes

FabioLuporini force-pushed the async-loads-final-2 branch 2 times, most recently from 181b588 to 191a162 Compare January 23, 2025 09:25

georgebisbas approved these changes Jan 23, 2025

View reviewed changes

FabioLuporini added 12 commits January 24, 2025 09:12

arch: Support querying arch properties

9041763

compiler: Fixup ModuloDimension abstraction

6d0b3e2

compiler: Fix Function reconstruction w custom halo

84fc4f0

arch: Add some Nvidia archs

2537280

compiler: Set Function._mem_heap

770a4cb

compiler: Honour input deriv ordering

ce3c65e

compiler: Make CompAccess honour base assumptions

b5ed317

compiler: Ensure Weights always get printed before any other expr

de4543a

compiler: Refactor _toposort

6b70771

compiler: Improve topo-fusion

abe8ba5

compiler: Tweak topo-fusion again

c2eb56b

compiler: Improve topo-fusion, minor

dba62be

FabioLuporini added 4 commits January 24, 2025 09:12

compiler: Make generation of ComponentAccess deterministic

21c3766

tests: Skip MPI test if test_dse not visible

a8d15b7

compiler: Tweak topo-fusion

3d85383

arch: Add more nvidia archs

b8de9ec

FabioLuporini force-pushed the async-loads-final-2 branch from 191a162 to b8de9ec Compare January 24, 2025 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: Misc improvements to code generation #2516

compiler: Misc improvements to code generation #2516

FabioLuporini commented Jan 14, 2025

FabioLuporini Jan 14, 2025 •

edited

Loading

codecov bot commented Jan 14, 2025 •

edited

Loading

georgebisbas Jan 14, 2025

FabioLuporini Jan 14, 2025

georgebisbas Jan 15, 2025

mloubout left a comment

mloubout Jan 14, 2025

mloubout Jan 16, 2025

EdCaunt Jan 20, 2025

FabioLuporini Jan 20, 2025

FabioLuporini Jan 20, 2025

EdCaunt Jan 20, 2025

FabioLuporini Jan 20, 2025

EdCaunt Jan 20, 2025

FabioLuporini Jan 20, 2025 •

edited

Loading

compiler: Misc improvements to code generation #2516

Are you sure you want to change the base?

compiler: Misc improvements to code generation #2516

Conversation

FabioLuporini commented Jan 14, 2025

FabioLuporini Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Jan 14, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mloubout left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FabioLuporini Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

FabioLuporini Jan 14, 2025 •

edited

Loading

codecov bot commented Jan 14, 2025 •

edited

Loading

FabioLuporini Jan 20, 2025 •

edited

Loading