-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Misc improvements to code generation #2516
base: master
Are you sure you want to change the base?
Conversation
@@ -540,19 +544,3 @@ def reduce_properties(clusters): | |||
properties[d] = normalize_properties(properties.get(d, v), v) | |||
|
|||
return Properties(properties) | |||
|
|||
|
|||
def tailor_properties(properties, ispace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for reviewers: finally moved to ir/support/properties as promised in an old PR
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2516 +/- ##
==========================================
- Coverage 87.31% 87.30% -0.02%
==========================================
Files 238 238
Lines 45847 45972 +125
Branches 4060 4074 +14
==========================================
+ Hits 40033 40134 +101
- Misses 5129 5150 +21
- Partials 685 688 +3 ☔ View full report in Codecov by Sentry. |
|
||
# Process the `weak` part of the key | ||
for i in reversed(range(len(k.weak) + 1)): | ||
choosable = [e for e in candidates if m[e].weak[:i] == k.weak[:i]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont like this var name but ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the choosable among the candidates -- I think it's OK! why don't you like it? Im open to alternatives!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, not sure....maybe like filtered
, valid
, eligible
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments but looks straightforward to me.
@@ -21,7 +21,10 @@ | |||
from devito.tools import Bunch | |||
|
|||
from examples.seismic.acoustic import acoustic_setup | |||
from tests.test_dse import TestTTI | |||
try: | |||
from tests.test_dse import TestTTI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just from .test_dse import TestTTI
. If this file is run, test_dse exists there should not be any case where this cannot be imported here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually just from test_dse import TestTTI
see e.g.
devito/tests/test_autotuner.py
Line 153 in da2c9a4
from test_dse import TestTTI |
1c11eae
to
72f4701
Compare
72f4701
to
82fa700
Compare
devito/arch/archinfo.py
Outdated
# Asynchronous pipeline loads -- introduced in Ampere | ||
return True | ||
elif query == 'tma' and cc >= 90: | ||
# Tensor Memory Acceleratory -- introduced in Hopper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "acceleratory"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll postpone it until the next PR unless I'm really convinced a further push is required by this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind. I had to push. Fixed
return False | ||
|
||
cc = get_nvidia_cc() | ||
if query == 'async-loads' and cc >= 80: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the checks for CUDA compiler version not in the Ampere
and Hopper
classes?
Suggestion: why not something along the lines of
class NvidiaDevice(Device):
_supports = ()
_mincuda = None
def supports(self, query, language=None):
if language != 'cuda':
return False
cc = get_nvidia_cc()
return query in _supports and (_mincuda is None or cc >= _mincuda)
class Volta(NvidiaDevice):
pass
class Ampere(Volta):
_supports = super()._supports + ('async-loads',)
_mincuda = 80
class Hopper(Ampere):
_supports = super()._supports + ('tma',)
_mincuda = 90
class Blackwell(Hopper):
pass
Whilst this could probably be made more elegant, it does reduce the number of lines of code and reduce code duplication.
Maybe this idea is too unsophisticated though, since one could conceivably have high-end hardware but outdated compilers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot use super() on class-level modules (attributes & methods)
and yes, you may have older / newer CUDA versions on different architectures
I suggest we refrain from over-engineering this until we'll truly need to do so (if ever, hopefully never) but I will consider push backs
@@ -730,6 +730,12 @@ def __init_finalize__(self, *args, **kwargs): | |||
|
|||
super().__init_finalize__(*args, **kwargs) | |||
|
|||
@classmethod | |||
def class_key(cls): | |||
# Ensure Weights appear before any other AbstractFunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Genuinely curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sympy voodoo, sigh
to avoid that e.g. CUDA unvectorized code looks like a[i]*b[i]
and like ab[i].y*ab[i].x
once vectorized (ie, with operands flipped). In short, for consistency of the order of arithmetic operations
something along these lines
181b588
to
191a162
Compare
191a162
to
b8de9ec
Compare
In essence, this is a batch of tweaks to support GPU features in PRO