Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: Misc improvements to code generation #2516

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

FabioLuporini
Copy link
Contributor

In essence, this is a batch of tweaks to support GPU features in PRO

@@ -540,19 +544,3 @@ def reduce_properties(clusters):
properties[d] = normalize_properties(properties.get(d, v), v)

return Properties(properties)


def tailor_properties(properties, ispace):
Copy link
Contributor Author

@FabioLuporini FabioLuporini Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewers: finally moved to ir/support/properties as promised in an old PR

Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 76.92308% with 45 lines in your changes missing coverage. Please review.

Project coverage is 87.30%. Comparing base (f71764a) to head (b8de9ec).

Files with missing lines Patch % Lines
devito/arch/archinfo.py 42.42% 19 Missing ⚠️
devito/ir/support/properties.py 76.74% 6 Missing and 4 partials ⚠️
devito/passes/clusters/misc.py 78.57% 5 Missing and 4 partials ⚠️
devito/arch/compiler.py 42.85% 4 Missing ⚠️
tests/test_mpi.py 60.00% 2 Missing ⚠️
devito/types/dense.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2516      +/-   ##
==========================================
- Coverage   87.31%   87.30%   -0.02%     
==========================================
  Files         238      238              
  Lines       45847    45972     +125     
  Branches     4060     4074      +14     
==========================================
+ Hits        40033    40134     +101     
- Misses       5129     5150      +21     
- Partials      685      688       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

devito/arch/archinfo.py Show resolved Hide resolved
devito/arch/compiler.py Show resolved Hide resolved

# Process the `weak` part of the key
for i in reversed(range(len(k.weak) + 1)):
choosable = [e for e in candidates if m[e].weak[:i] == k.weak[:i]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont like this var name but ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the choosable among the candidates -- I think it's OK! why don't you like it? Im open to alternatives!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, not sure....maybe like filtered, valid, eligible ?

Copy link
Contributor

@mloubout mloubout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments but looks straightforward to me.

devito/passes/clusters/misc.py Show resolved Hide resolved
devito/passes/clusters/misc.py Outdated Show resolved Hide resolved
devito/types/basic.py Show resolved Hide resolved
devito/arch/archinfo.py Outdated Show resolved Hide resolved
@@ -21,7 +21,10 @@
from devito.tools import Bunch

from examples.seismic.acoustic import acoustic_setup
from tests.test_dse import TestTTI
try:
from tests.test_dse import TestTTI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just from .test_dse import TestTTI. If this file is run, test_dse exists there should not be any case where this cannot be imported here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually just from test_dse import TestTTI see e.g.

from test_dse import TestTTI

@mloubout mloubout force-pushed the async-loads-final-2 branch from 72f4701 to 82fa700 Compare January 14, 2025 19:30
# Asynchronous pipeline loads -- introduced in Ampere
return True
elif query == 'tma' and cc >= 90:
# Tensor Memory Acceleratory -- introduced in Hopper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "acceleratory"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll postpone it until the next PR unless I'm really convinced a further push is required by this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind. I had to push. Fixed

return False

cc = get_nvidia_cc()
if query == 'async-loads' and cc >= 80:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the checks for CUDA compiler version not in the Ampere and Hopper classes?

Suggestion: why not something along the lines of

class NvidiaDevice(Device):
    _supports = ()
    _mincuda = None

    def supports(self, query, language=None):
        if language != 'cuda':
            return False

        cc = get_nvidia_cc()
        return query in _supports and (_mincuda is None or cc >= _mincuda)
 
class Volta(NvidiaDevice):
    pass

class Ampere(Volta):
    _supports = super()._supports + ('async-loads',)
    _mincuda = 80

class Hopper(Ampere):
    _supports = super()._supports + ('tma',)
    _mincuda = 90

class Blackwell(Hopper):
    pass

Whilst this could probably be made more elegant, it does reduce the number of lines of code and reduce code duplication.

Maybe this idea is too unsophisticated though, since one could conceivably have high-end hardware but outdated compilers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot use super() on class-level modules (attributes & methods)

and yes, you may have older / newer CUDA versions on different architectures

I suggest we refrain from over-engineering this until we'll truly need to do so (if ever, hopefully never) but I will consider push backs

@@ -730,6 +730,12 @@ def __init_finalize__(self, *args, **kwargs):

super().__init_finalize__(*args, **kwargs)

@classmethod
def class_key(cls):
# Ensure Weights appear before any other AbstractFunction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Genuinely curious

Copy link
Contributor Author

@FabioLuporini FabioLuporini Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sympy voodoo, sigh

to avoid that e.g. CUDA unvectorized code looks like a[i]*b[i] and like ab[i].y*ab[i].x once vectorized (ie, with operands flipped). In short, for consistency of the order of arithmetic operations

something along these lines

@FabioLuporini FabioLuporini force-pushed the async-loads-final-2 branch 2 times, most recently from 181b588 to 191a162 Compare January 23, 2025 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants