-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make in-place FFT optional #155
Conversation
@lukasm91 can you see this having any performance implications for CUDA/cuFFT execution? I think I've put all the expensive stuff inside the guards so it shouldn't be compiled on an Nvidia platform. |
ce5c7cf
to
dea414f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks like it should not have perf implications (and I also quickly checked it on our machines)
dea414f
to
3c6810a
Compare
This is currently disabled for cuFFT but enabled for hipFFT. In-place FFTs seem to be an issue for ROCm at the moment. This is a temporary workaround.
Co-authored-by: lukasm91 <[email protected]>
1758aa1
to
89def35
Compare
REAL(KIND=JPRBT) :: DUMMY | ||
|
||
#ifndef IN_PLACE_FFT | ||
HFTDIR%HREEL_COMPLEX = RESERVE(ALLOCATOR, INT(KF_FS*D%NLENGTF*SIZEOF(DUMMY), KIND=C_SIZE_T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIZEOF
is a non-standard extension. Some compilers could complain.
Standard in F2008 is STORAGE_SIZE
, which gives you bits
(not bytes!) and C_SIZEOF
, which gives you bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIZEOF
occurs in a few places in the GPU tree. @lukasm91 any strong feelings about switching to 8*STORAGE_SIZE(DUMMY)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for letting me know... for me, fortran is learning by doing - I simple didn't realize that this is not in the standard, so this is good to know and learning about the right way to do it :)
PREEL_COMPLEX => PREEL_REAL | ||
#else | ||
CALL ASSIGN_PTR(PREEL_COMPLEX, GET_ALLOCATION(ALLOCATOR, HFTDIR%HREEL_COMPLEX),& | ||
& 1_C_SIZE_T, INT(KFIELD*D%NLENGTF*SIZEOF(PREEL_COMPLEX(1)),KIND=C_SIZE_T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here
REAL(KIND=JPRBT) :: DUMMY | ||
|
||
#ifndef IN_PLACE_FFT | ||
HFTINV%HREEL_REAL = RESERVE(ALLOCATOR, INT(D%NLENGTF*KF_FS*SIZEOF(DUMMY),KIND=C_SIZE_T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
PREEL_REAL => PREEL_COMPLEX | ||
#else | ||
CALL ASSIGN_PTR(PREEL_REAL, GET_ALLOCATION(ALLOCATOR, HFTINV%HREEL_REAL),& | ||
& 1_C_SIZE_T, INT(KFIELD*D%NLENGTF*SIZEOF(PREEL_REAL(1)),KIND=C_SIZE_T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here..
Perhaps there are more instances in the code. This seems like a typical copy-paste-edit line.
… reverted once ecmwf-ifs#155 is merged
It seems that hipFFT has a problem with in-place FFTs. Running on LUMI-G (MI250x), this error is thrown at runtime:
Some investigations with @PaulMullowney showed that if you create a separate buffer for the output, the error vanishes.
With this change we are now finally able to run the optimised GPU version of ecTrans (now the GPU version of ecTrans) on AMD GPUs, at least with a single MI250x GPU on LUMI-G. Multi-GPU runs are still a work in progress.
This PR makes in-place FFT compile-time adjustable. For now, in-place is enabled for cuFFT, but disabled for hipFFT.
@PaulMullowney and I will see what we can do to flag this to hipFFT developers so we can eventually remove this option.
Note that earlier
HIPFFT_PARSE_ERROR
s no longer appear with ROCm 6.0.2, which is the new default for LUMI-G.Note that this requires testing on an Nvidia GPU. Hopefully we have not ruined things there.
This PR builds on #150 so best to merge that one first.