Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL in homme #6569

Closed
wants to merge 103 commits into from
Closed

SYCL in homme #6569

wants to merge 103 commits into from

Conversation

oksanaguba
Copy link
Contributor

@oksanaguba oksanaguba commented Aug 28, 2024

Changes needed for spot/aurora: sycl kokkos spaces, temporary TP changes (TP init will be sorted out later), switch booleans to ints in F<-->xx interfaces.

[bfb]

@oksanaguba
Copy link
Contributor Author

@bartgol or @ambrad in this branch i revived possibility to use external kokkos build instead of pointing ekat to a new kokkos version. i still prefer to have such option, but do you object to it?

@ambrad
Copy link
Member

ambrad commented Aug 30, 2024

@oksanaguba is that for standalone Homme? I suspect that's an easier case for supporting an external Kokkos build than the CIME case. In any case, it's fine with me; I believe the option was lost by accident.

@bartgol
Copy link
Contributor

bartgol commented Sep 3, 2024

@oksanaguba You can tell cmake to use a prebuilt Kokkos, even with the current infrastructure. You should be able to do it via either an env or a CMake variable. E.g., via env var:

export Kokkos_ROOT=/home/lbertag/workdir/libs/ekat/ekat-install/branch

cmake <options> /path/to/source

For me this gives the output:

-- Looking for a Kokkos installation ...
-- Looking for a Kokkos installation ... FOUND
--   Kokkos_DIR: /home/lbertag/workdir/libs/ekat/ekat-install/branch/lib64/cmake/Kokkos

The same should be achievable by setting the CMake var Kokkos_ROOT (or it may be Kokkos_DIR in that case, I'm not sure). Either way, you always point to the install root folder, of course.

@oksanaguba
Copy link
Contributor Author

Ready for review, please, @bartgol and @ambrad (i will run tests meanwhile).

@oksanaguba oksanaguba changed the title [wip] sycl branch SYCL in homme Sep 5, 2024
@oksanaguba oksanaguba removed the request for review from rebustamante September 5, 2024 20:38
Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by two things:

  1. the linker language for SUNSPOT
  2. all the bool conversions to int

I guess there may be some issues with bool on SYCL device (not sure if that's the case, but I can see something like that happening), so it's ok to create an int local var from a bool. But the rest of the infrastructure should use bool/logical for stuff that is conceptually a boolean.

SET_TARGET_PROPERTIES(${execName} PROPERTIES LINKER_LANGUAGE Fortran)

if(SUNSPOT_MACHINE)
SET_TARGET_PROPERTIES(${execName} PROPERTIES LINKER_LANGUAGE CXX)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This puzzles me. The linker language should always be the one of the main entry point (in this case, F90). Why do we need this?

@@ -0,0 +1,64 @@
#module restore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: what does "-aot" stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,58 @@
#module restore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And "-jit"? I doubt it is "Just In Time"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -16,7 +16,7 @@
namespace Homme {

void init_gllfvremap_c (int nelemd, int np, int nf, int nf_max,
bool theta_hydrostatic_mode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming all these changes are b/c there's a problem getting bool values in a device lambda on SYCL, or something like that?

If that's the case, I would keep bool everywhere, and simply convert to int in the C++ structures that do need an int on SYCL. That would allow to leave lots of other files (including the F90 files) unchanged.

@@ -395,7 +395,7 @@ ::run_dyn_to_fv_phys (const int timeidx, const Phys1T& ps, const Phys1T& phis, c
const auto hvcoord = m_hvcoord;

const bool use_moisture = m_data.use_moisture;
const bool theta_hydrostatic_mode = m_data.theta_hydrostatic_mode;
const int theta_hydrostatic_mode = m_data.theta_hydrostatic_mode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the only place where you need this to be an int, right? I see it used inside the KOKKOS_LAMBDA below. Then, you can keep theta_hydrostatic_mode as a bool everywhere, and just use an int for the local var here.

#else
m_process_nh_vars = !params.theta_hydrostatic_mode;
//m_process_nh_vars = !params.theta_hydrostatic_mode;
if (params.theta_hydrostatic_mode){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just do

m_process_nh_vars = static_cast<int>(params.theta_hydrostatic_mode);

Faster to read, and makes your intent of simply changing the type clearer.

@@ -43,12 +43,13 @@ void init_simulation_params_c (const int& remap_alg, const int& limiter_option,
const Real& nu, const Real& nu_p, const Real& nu_q, const Real& nu_s, const Real& nu_div, const Real& nu_top,
const int& hypervis_order, const int& hypervis_subcycle, const int& hypervis_subcycle_tom,
const double& hypervis_scaling, const double& dcmip16_mu,
const int& ftype, const int& theta_adv_form, const bool& prescribed_wind, const bool& moisture, const bool& disable_diagnostics,
const bool& use_cpstar, const int& transport_alg, const bool& theta_hydrostatic_mode, const char** test_case,
const int& ftype, const int& theta_adv_form, const int& prescribed_wind, const int& use_moisture, const int& disable_diagnostics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing the type of these boolean from bool to int?

! Initialize the C++ reference element structure (i.e., pseudo-spectral deriv matrix and ref element mass matrix)
dvv = deriv1%dvv
elem_mp = mp
call init_reference_element_c(c_loc(dvv),c_loc(elem_mp))

! Fill the simulation params structures in C++
test_name = TRIM(test_case) // C_NULL_CHAR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly complicated. We create an int from a bool to call the c function, and then inside the c function we convert the int back to a bool. Why not passing bool in the first place?

@@ -138,11 +138,11 @@ end subroutine init_reference_element_c

! Create C++ functors
subroutine init_functors_c (allocate_buffer) bind(c)
use iso_c_binding, only: c_bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, why the bool->int change?

@@ -17,6 +17,8 @@ ENDIF()
SET (USE_MPIEXEC "srun" CACHE STRING "")
SET (USE_MPI_OPTIONS "-K --cpu_bind=cores" CACHE STRING "")

SET (CHRYSALIS_MACHINE TRUE CACHE BOOL "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this var ever being used. Can we rm from these two mach files?

@ambrad
Copy link
Member

ambrad commented Sep 6, 2024

Two points for consideration:

  1. I agree with Luca that it's not clear the bool->int conversion to work around the system issue is, in its current form, as minimal as it can be. It might be worth thinking about whether the workaround can be implemented with substantially fewer modifications.
  2. This PR has a lot of micro commits, 103. Ideally, these would be squashed to a few meaningful ones.

@oksanaguba
Copy link
Contributor Author

Thanks, Luca and Andrew. Luca used lazygit and made a new branch with a few commits out of this one. If you do not like int-bool changes, we can just wait till the issue resolves itself, and then redo this work. If possible, I would not want to spend more time cleaning that change (but i will check HV functor if i had an unnecessary change there).

@oksanaguba
Copy link
Contributor Author

see #6594 .

@oksanaguba oksanaguba closed this Sep 6, 2024
@oksanaguba oksanaguba mentioned this pull request Sep 6, 2024
oksanaguba added a commit that referenced this pull request Sep 20, 2024
Changes needed for spot/aurora: sycl kokkos spaces,
temporary TP changes (TP init will be sorted out later),
switch booleans to ints in F<-->xx interfaces.

see #6569 for the original branch/history.

[bfb]
oksanaguba added a commit that referenced this pull request Sep 21, 2024
Changes needed for spot/aurora: sycl kokkos spaces,
temporary TP changes (TP init will be sorted out later),
switch booleans to ints in F<-->xx interfaces.

see #6569 for the original branch/history.

[bfb]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants