Remove sycl namespace alias #2971

WeiqunZhang · 2022-10-03T16:54:50Z

This causes a conflict with new compilers.

@afanfa

commit 10e99fb Merge: d03045d f1e1d6f Author: Andrew Myers <[email protected]> Date: Wed Nov 2 14:06:00 2022 -0700 Merge branch 'particle_soa_refactor' of github.com:Thierry992/amrex into HEAD commit d03045d Author: Andrew Myers <[email protected]> Date: Wed Nov 2 14:04:23 2022 -0700 fix buffer pack / unpack commit d771fc8 Author: Andrew Myers <[email protected]> Date: Wed Nov 2 14:04:08 2022 -0700 revert to one int for each id for now commit f1e1d6f Merge: 4dbfbac c4a4811 Author: Axel Huebl <[email protected]> Date: Tue Nov 1 15:18:54 2022 -0500 Merge remote-tracking branch 'mainline/development' into particle_soa_refactor commit c4a4811 Author: Axel Huebl <[email protected]> Date: Tue Nov 1 14:08:38 2022 -0500 C++17 Transition (AMReX-Codes#2992) ## Summary Update AMReX to require C++17 or newer. - [x] docs - [x] CMake - [x] GNUmake - [x] CI ## Additional background Requires a mature [C++17](https://en.wikipedia.org/wiki/C%2B%2B17) compiler, e.g., GCC 8, Clang 7, NVCC 11.0, MSVC 19.15 or newer. Already used since 1+ year in production by downstream codes such as Castro and WarpX. Needed for modernization and new features such as AMReX-Codes#2878 Co-authored-by: Weiqun Zhang <[email protected]> commit d2b8293 Author: Weiqun Zhang <[email protected]> Date: Tue Nov 1 09:01:54 2022 -0700 Update CHANGES for 22.11 (AMReX-Codes#3006) commit 5ec270b Author: Weiqun Zhang <[email protected]> Date: Tue Nov 1 08:59:44 2022 -0700 Fix compilation for PETSc (AMReX-Codes#3005) We cannot include PETSc headers too early because it might redefine MPI routines as macros (https://github.com/petsc/petsc/blob/main/include/petsclog.h#L441). They break MPI calls like below, MPI_Allreduce(&tmp, &vi, 1, ParallelDescriptor::Mpi_typemap<T>::type(), ParallelDescriptor::Mpi_op<T,amrex::Greater<T>>(), comm); because of the `,` in `<T,amrex::Greater<T>>`. commit 735c351 Author: Weiqun Zhang <[email protected]> Date: Sat Oct 29 10:57:23 2022 -0700 MPI Reduce for ValLocPair (AMReX-Codes#3003) Add ParallelReduce::Min, ParallelReduce::Max, ParallelAllReduce::Min, and ParallelAllReduce::Max for ValLocPair<TV,TI>, where TV and TI are types that have corresponding MPI types (e.g., int, Real, IntVect, Box, etc.). commit 3ec0768 Author: Axel Huebl <[email protected]> Date: Wed Oct 26 16:49:40 2022 -0700 `FabArray::isDefined` (AMReX-Codes#2997) ## Summary Add a new query to `define_function_called`. ## Additional background This is a cheaper check than `ok()` for finding out if a MultiFab has been allocated or not yet, assuming that the calling code follows the convention that `define()` is called collectively. Update: It turns out you can also call `empty` inherited from `FabArrayBase`. The new API is quite explicit, which is ok, too. Co-authored-by: Weiqun Zhang <[email protected]> commit 7f3c908 Author: Weiqun Zhang <[email protected]> Date: Wed Oct 26 16:40:16 2022 -0700 Make The_Device_Arena non-managed (AMReX-Codes#2998) The_Device_Arena used to be a separate Arena. We changed it to be an alias of The_Arena to avoid memory fragmentation. However, the issue is we don't have an Arena that can allocate non-managed memory unless The_Arena is not managed. Because of performance concerns, we sometimes want to allocate non-managed memory. Therefore, we make The_Device_Arena an alias if and only if The_Arena is not managed. commit ab8c892 Author: Weiqun Zhang <[email protected]> Date: Wed Oct 26 15:59:39 2022 -0700 Add alias template Gpu::NonManagedDeviceVector (AMReX-Codes#2999) commit b3e0a62 Author: Weiqun Zhang <[email protected]> Date: Wed Oct 26 15:02:13 2022 -0700 Pre- and Post-interpolation hook interface (AMReX-Codes#2991) Support both Fab and MultiFab versions of pre- and post-interpolation hooks. Because the pre-interp hook might modify the data, we need to make a copy to avoid modifying cached coarse data. Close AMReX-Codes#2989. commit 3082028 Author: Weiqun Zhang <[email protected]> Date: Wed Oct 19 19:24:10 2022 -0700 Update GitHub Actions (AMReX-Codes#2996) https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/ ## Summary ## Additional background ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate commit 0b88bfd Author: Weiqun Zhang <[email protected]> Date: Wed Oct 19 13:39:18 2022 -0700 Add user defined BC types (AMReX-Codes#2995) Add BCType::user_1, BCType::user_2 and BCType::user_3. Previously the only "user" type is ext_dir (external Dirichlet). The BC types are passed from the user's code to FillPatch, which in turn passes them back to the user provided BC filling function. These new types will make it easy for the user to determine the user defined BC types in their BC filling functions. commit 9502b99 Author: Weiqun Zhang <[email protected]> Date: Tue Oct 18 10:20:06 2022 -0700 Add BCRec::set for convenience (AMReX-Codes#2993) commit 4dbfbac Author: Thierry Antoun <[email protected]> Date: Mon Oct 17 15:05:54 2022 -0700 Adding AMReX_RESTRICT for GPU Test commit 7051a6c Author: Thierry Antoun <[email protected]> Date: Mon Oct 17 15:03:19 2022 -0700 Modyfing RedistributeMPI to make it work with 2 ranks commit 56b6402 Author: Weiqun Zhang <[email protected]> Date: Sat Oct 15 14:59:38 2022 -0700 ParallelFor with compile time optimization of kernels with run time parameters (AMReX-Codes#2954) Branches inside ParallelFor can be very expensive. If a branch uses a lot of resources (e.g., registers), it can significantly affect the performance even if at run time the branch is never executed because it affects the GPU occupancy. For CPUs, it can affect vectorization of the kernel. The new ParallelFor functions use C++17 fold expression to generate kernel launches for all run time variants. Only one will be executed. Which one is chosen at run time depends the run time parameters. The kernel function can use constexpr if to discard unused code blocks for better run time performance. Here are two examples of how to use them. int runtime_option = ...; enum All_options : int { A0, A1, A2, A3}; // Four ParallelFors will be generated. ParallelFor(TypeList<CompileTimeOptions<A0,A1,A2,A3>>{}, {runtime_option}, box, [=] AMREX_GPU_DEVICE (int i, int j, int k, auto control) { ... if constexpr (control.value == A0) { ... } else if constexpr (control.value == A1) { ... } else if constexpr (control.value == A2) { ... else { ... } ... }); and int A_runtime_option = ...; int B_runtime_option = ...; enum A_options : int { A0, A1, A2, A3}; enum B_options : int { B0, B1 }; // 4*2=8 ParallelFors will be generated. ParallelFor(TypeList<CompileTimeOptions<A0,A1,A2,A3>, CompileTimeOptions<B0,B1> > {}, {A_runtime_option, B_runtime_option}, N, [=] AMREX_GPU_DEVICE (int i, auto A_control, auto B_control) { ... if constexpr (A_control.value == A0) { ... } else if constexpr (A_control.value == A1) { ... } else if constexpr (A_control.value == A2) { ... else { ... } if constexpr (A_control.value != A3 && B_control.value == B1) { ... } ... }); Note that that due to a limitation of CUDA's extended device lambda, the constexpr if block cannot be the one that captures a variable first. If nvcc complains about it, you will have to manually capture it outside constexpr if. The data type for the parameters is int. Thank Maikel Nadolski and Alex Sinn for showing us the meta-programming techniques used here. commit bcbf17f Author: Weiqun Zhang <[email protected]> Date: Fri Oct 14 19:48:14 2022 -0700 2D RZ solver for WarpX: Arbitrary coefficient (AMReX-Codes#2986) The assumption in the 2D RZ solver for WarpX used to be there was no sigma_r (i.e., sigma_r == 1). In this PR, we allow arbitrary sigma_r coefficient. commit 9a3cd5d Author: Axel Huebl <[email protected]> Date: Fri Oct 14 17:27:41 2022 -0700 CMake Docs: Fix User-Guidance (Link) (AMReX-Codes#2990) Update the user-guidance on CMake dependency linking to CMake 3.0+ (anno. 2014+). Seen in AMReX-Codes#2978 commit 1ad4144 Author: Weiqun Zhang <[email protected]> Date: Fri Oct 14 10:36:17 2022 -0700 Runge-Kutta support for AMR (AMReX-Codes#2974) This adds RK2, RK3 and RK4 in a new namespace RungeKutta. Together with the enhanced FillPatcher class, these functions can be used for RK time stepping in AMR simulations. A new function AmrLevel::RK is added for AmrLevel based codes. See CNS::advance in Tests/GPU/CNS/CNS_advance.cpp for an example of using the new AmrLevel::RK function. The main motivation for this PR is that ghost cell filling for high order (> 2) RK methods at coarse/fine boundary is non-trivial when there is subcycling. Co-authored-by: Jean M. Sexton <[email protected]> commit c841ae8 Author: Weiqun Zhang <[email protected]> Date: Fri Oct 14 10:03:34 2022 -0700 Fourth-order interpolation from fine to coarse level (AMReX-Codes#2987) For fourth-order finite-difference methods with data at cell centers, we cannot use the usual averageDown function to overwrite coarse level data with fine data. We actually need to do interpolation. commit 975b830 Author: Weiqun Zhang <[email protected]> Date: Fri Oct 14 09:53:22 2022 -0700 Fix EB data inconsistency when fixing small cells and multiple cuts (AMReX-Codes#2943) ## Summary For consistency, we need to call the function that zeros out the level set even if that box does not have any small cells or multiple cuts. This is because a node could exist in multiple boxes. Furthermore, a covered cell or covered face may have a node with a level set < 0. ## Additional background This is usually not an issue. However, in WarpX, we use the level set to decide whether a node is an unknown in the linear system. The inconsistency makes the solver fail in some cases. commit 9c2264b Author: Axel Huebl <[email protected]> Date: Fri Oct 14 07:41:06 2022 -0700 `MFIter::Finalize`: Free `m_fa` (AMReX-Codes#2988) This `free` should potentially not be delayed until the destructor is called. Follow-up to AMReX-Codes#2985 AMReX-Codes#2983 commit f84c7a8 Author: Weiqun Zhang <[email protected]> Date: Wed Oct 12 10:44:11 2022 -0700 Fix MLMG::getGradSolution & getFluxes for inhomogeneous Neumann and Robin BC (AMReX-Codes#2984) Because of the way how inhomogeneous and Robin BC are handled, we must add the inhomogeneous fluxes back, otherwise they would be zero at those boundaries. commit ed1ecd6 Author: Axel Huebl <[email protected]> Date: Wed Oct 12 08:46:34 2022 -0700 MFIter: Make Finalize Public (AMReX-Codes#2985) Follow-up to AMReX-Codes#2983 commit 5acfe07 Author: Axel Huebl <[email protected]> Date: Tue Oct 11 14:51:48 2022 -0700 MFIter::Finalize (AMReX-Codes#2983) Add a Finalize function to MFIter. The idea about this is, that we can call this already before destruction in Python, where `for` loops do not create scope. This function must be robust enough to be called again in the constructor (or we need to add an extra bool to guard that it is not called again in the destructor). Co-authored-by: Weiqun Zhang <[email protected]> commit 53e34d1 Author: Andy Nonaka <[email protected]> Date: Tue Oct 11 12:00:34 2022 -0700 fix docs; Robin BC's for MLMG (AMReX-Codes#2982) Update the MLMG Robin BC description in the docs. commit 0019b3a Author: Weiqun Zhang <[email protected]> Date: Tue Oct 11 11:00:13 2022 -0700 MLLinOp::postSolve (AMReX-Codes#2981) Add a virtual function MLLinOp::postSolve. This allows WarpX to set EB covered nodes to prescribed values in the solver's output for visualization purpose. commit 2d87a4c Author: Brandon Runnels <[email protected]> Date: Mon Oct 10 09:49:29 2022 -0600 add templating for the cell bilinear interpolators (AMReX-Codes#2979) This templates the `mf_cell_bilin_interp` functions so that the interpolators can be used with `BaseFab`s of arbitrary type. commit e4ab048 Author: Weiqun Zhang <[email protected]> Date: Wed Oct 5 12:03:41 2022 -0700 FillPatcher class (AMReX-Codes#2972) This adds a class FillPatcher for filling fine level data. It's not as general as the various FillPatch functions (e.g., FillPatchTwoLevels). However, it can reduce the amount of communication data. Suppose we use RK2 with subcycling and the refinement ratio is 2. For each step on level 0, there are two steps on level 1. With RK2, each fine step needs to call FillPatch twice. So the total number of FillPatch calls is 4 in the two fine steps. Using the free function, one ParallelCopy per FillPatch call is needed for copying coarse data for spatial interpolation. With the FillPatcher class, two ParallelCopy calls will be done to copy old and new coarse data. Then these data will be used in the four FillPatcher::fill calls. This new approach saves two ParallelCopy calls per coarse step for a two levels run. It could save more if the time stepping requires more substeps or the refinement ratio is higher. Note that many of our AMReX codes use a time stepping algorithm that needs only one FillPatch call per step. For those codes, this new approach will not save any communication for a refinement ratio of 2. However, it will save communication when the refinement ratio is 4. commit 1bc4e4e Author: Weiqun Zhang <[email protected]> Date: Mon Oct 3 16:50:45 2022 -0700 Remove sycl namespace alias (AMReX-Codes#2971) This causes a conflict with new compilers. commit de7b7f4 Author: Weiqun Zhang <[email protected]> Date: Mon Oct 3 14:06:58 2022 -0700 Fix Tensor Solver BC (AMReX-Codes#2930) This fixes some bugs in the physical domain BC of tensor linear solver. At the corner of two no-slip walls (e.g., (0,0)), we have u(-1,0) = -u(0,0) and u(0,-1) = -u(0,0). It's incorrect to fill the corner ghost cell with u(-1,-1) = u(-1,0) + u(0,-1) - u(0,0), because it will result in u(-1,-1) = -3 * u(0,0). In the old approach, to avoid branches in computing transverse derivatives on cell faces, we fill the ghost cells first. For example, to compute du/dy at the lo-x boundary, we use the data in i = -1 and 0, just like we compute du/dy(i) using u(i-1) and u(i) for interior faces. The problem is the normal velocity in the ghost cells outside a wall is filled with extrapolation of the Dirichlet value (which is zero) and more than 1 interior cells. Because of the high-order extrapolation, u(-1) != -u(0). This is the desired approach for computing du/dx on the wall. However, this produces incorrect results in dudy. In the new approach, we explicitly handle the boundaries in the derivative stencil. For example, to compute transverse derivatives on an inflow face, we use the boundary values directly. Co-authored-by: cgilet <[email protected]> commit 13aa4df Author: Weiqun Zhang <[email protected]> Date: Fri Sep 30 17:48:22 2022 -0700 Disable host device for macros for SYCL/DPC++ (AMReX-Codes#2969) The host part of the AMREX_HOST_DEVICE_FOR_* macros is disabled for SYCL/DPC++. It's really slow for compilation. commit 62379fb Author: Weiqun Zhang <[email protected]> Date: Fri Sep 30 15:37:35 2022 -0700 Update CHANGES for 22.10 (AMReX-Codes#2968) commit d65e09e Author: Roberto Porcu <[email protected]> Date: Thu Sep 29 15:46:19 2022 -0400 Solve an issue with particles async IO when having runtime added variables (AMReX-Codes#2966) commit cd07b0d Author: Weiqun Zhang <[email protected]> Date: Wed Sep 28 09:20:42 2022 -0700 Fix int overflow in amrex::bisect (AMReX-Codes#2964) Change from (lo+hi)/2 to lo+(hi-lo)/2. Although it's very unlikely, it's possible (lo+hi), where both lo and hi are integers, could overflow. commit e55d6b4 Author: Junghyeon Park <[email protected]> Date: Thu Sep 29 01:20:15 2022 +0900 Update the SWFFT project site (AMReX-Codes#2965) commit b84d7c0 Author: Weiqun Zhang <[email protected]> Date: Mon Sep 26 16:05:10 2022 -0700 Fix MLEBNodeFDLaplacian bottom solver (AMReX-Codes#2963) MLEBNodeFDLaplacian is never singular because it has Dirichlet boundary on the EB surface. We did set the singular flag to false, but forgot about the bottom solver used a different function to query. This fixes it by overriding the isBottomSingular function. commit 5e84f43 Author: asalmgren <[email protected]> Date: Sun Sep 25 09:38:51 2022 -0700 make tagging routines EB_aware (AMReX-Codes#2962) commit 8b367b0 Author: Weiqun Zhang <[email protected]> Date: Sun Sep 25 09:22:13 2022 -0700 Volume weighted sum (AMReX-Codes#2961) Add a new function doing volume weighted sum across AMR levels. This may not be exactly what amrex application codes want. But it should work for many cases. commit 2a3cc05 Author: Weiqun Zhang <[email protected]> Date: Fri Sep 23 12:24:05 2022 -0700 CellData: data in a single cell (AMReX-Codes#2959) This adds struct CellData that allows for accessing data in a single cell in Array4. This is convenient sometimes because one can omit the i, j and k indices. It might also be faster sometimes because it can skip the repeated index calculation involving i,j,k. commit 27ef106 Author: Weiqun Zhang <[email protected]> Date: Fri Sep 23 12:23:34 2022 -0700 Quartic interpolation for cell centered data (AMReX-Codes#2960) New Interpolator for interpolation of cell centered data using a fourth-degreee polynomial. Note that the interpolation is not conservative and does not do any slope limiting. commit c4b7982 Author: Luca Fedeli <[email protected]> Date: Fri Sep 23 21:17:12 2022 +0200 Add GPU-compatible upper bound and lower bound algorithms to AMReX_Algorithm (AMReX-Codes#2958) commit 3e5cc77 Author: Don E. Willcox <[email protected]> Date: Tue Sep 20 17:59:48 2022 -0700 add option for makebuildsources to specify the style arguments for 'git describe'. (AMReX-Codes#2957) commit a6e0c11 Author: Weiqun Zhang <[email protected]> Date: Tue Sep 20 10:01:21 2022 -0700 Add more warnings (AMReX-Codes#2956) * Add -Wnon-virtual-dtor -Wlogical-op -Wmisleading-indentation -Wduplicated-cond -Wduplicated-branches to gcc. * Add -Wnon-virtual-dtor to clang. * Add more warnings to CI. * Fix some non-virtual dtors and some other warnings. commit 826cd37 Author: Phil Miller <[email protected]> Date: Thu Sep 15 17:26:00 2022 -0700 Add roundoff_lo corresponding to roundoff_hi for domains that don't start at 0 (AMReX-Codes#2950) * Lay groundwork for roundoff_lo * Add dummy implementation of roundoff_lo computation * implement bisect_prob_lo * change idx -> dxinv * use rlo instead of plo in locateParticle Co-authored-by: atmyers <[email protected]> commit 6a5a056 Author: Weiqun Zhang <[email protected]> Date: Thu Sep 15 13:23:40 2022 -0700 Add template parameter to ParallelFor and launch specifying block size (AMReX-Codes#2947) By default, amrex::ParallelFor launches AMREX_GPU_MAX_THREADS threads per block. We can now explicitly specfiy the block size with `ParallelFor<BLOCK_SIZE>(...)`, where BLOCK_SIZE should be a multiple of the warp size (e.g., 64, 128, etc.). A similar change has also been made to `launch`. The changes are backward compatible. commit 2cdb9df Author: Andrew Myers <[email protected]> Date: Thu Sep 15 10:55:41 2022 -0700 Byte spread fixes (AMReX-Codes#2949) commit 17c94cc Author: Candace Gilet <[email protected]> Date: Wed Sep 14 11:49:35 2022 -0400 Correct MultiFab::norm0 doxygen brief description (AMReX-Codes#2946) commit 0351c99 Author: Axel Huebl <[email protected]> Date: Wed Sep 14 08:48:25 2022 -0700 CMake: HIP_PATH from ROCM_PATH (AMReX-Codes#2948) * On machines like Crusher, `ROCM_PATH` is more likely to be available then a `HIP_PATH` environment variable. This is mainly needed for our hacky ROCTX hints. * ROCTX: New Include Supposedly, there is a new include we shall use: Ref.: ROCm/roctracer#79 * ROCtracer: Include as System library Because of GNU extensions in the roctracer include files for the legacy include. But we should make this `-isystem` anyway to be robust for the future. The 5.2 deprecated include file `<roctracer_ext.h>` throws warnings because they rely on GNU extensions: ``` In file included from /opt/rocm/hip/../roctracer/include/ext/prof_protocol.h:27: /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:70:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:70:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:75:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:82:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:86:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:90:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:82:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:86:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:90:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] struct { ^ ``` * GNUmake: Update Includes in `hip.mak` Use public prefix. commit 9aa23c2 Author: Cody Balos <[email protected]> Date: Mon Sep 12 11:49:37 2022 -0700 Fix minor typo in fcompare docs (AMReX-Codes#2945) commit bfbd68f Author: Axel Huebl <[email protected]> Date: Mon Sep 12 11:40:55 2022 -0700 Fix: Make Finalize->Initialize->F->I->... Work (AMReX-Codes#2944) Fix assertions in Arena::Initialize. The_BArena never dies (tm) Co-authored-by: Weiqun Zhang <[email protected]> commit 6738470 Author: Weiqun Zhang <[email protected]> Date: Wed Sep 7 14:12:34 2022 -0700 Changes for Cray & Clang (AMReX-Codes#2941) * It seems that the new Cray compilers no longer define `_CRAYC`. However it does define `__cray__`. * For Clang based Cray compilers, use -O3 instead of -O2 for optimization. * Clang's vectorization pragma is very aggressive. For some codes, it makes ParallelFor with many if statements on CPU much slower than without vectorization. Unfortunately, it does not have an ivdep pragma. So we disable AMREX_PRAGMA for clang for safety. * No longer need to use -Wno-pass-failed for Clang based compilers. commit 5b0c598 Author: Weiqun Zhang <[email protected]> Date: Wed Sep 7 09:42:57 2022 -0700 Fix a warning in packing communication send buffer (AMReX-Codes#2940) When we communication double precision data in single precision, there is a conversion from double to float in packing the send buffer. A static cast is added to fix the warning. commit 3e397bb Author: Weiqun Zhang <[email protected]> Date: Wed Sep 7 09:13:53 2022 -0700 Link to cublas when using CUDA and Hypre (AMReX-Codes#2933) commit 9525ea8 Author: Weiqun Zhang <[email protected]> Date: Wed Sep 7 09:13:20 2022 -0700 HIP: use coarse grained host memory (AMReX-Codes#2932) commit 7e04016 Author: Marco Garten <[email protected]> Date: Wed Sep 7 08:53:20 2022 -0700 Update Testing Docs (AMReX-Codes#2937) - document `abort_on_unused_inputs` - remove duplicate superfluous argument in regtest call commit 539427a Author: drangara <[email protected]> Date: Tue Sep 6 18:13:42 2022 -0400 EB checkpoint files (AMReX-Codes#2897) * support for loading EB from checkpoint file * add support for writing chkpt file as well Co-authored-by: Weiqun Zhang <[email protected]> commit 35ed6b4 Author: Axel Huebl <[email protected]> Date: Tue Sep 6 15:07:16 2022 -0700 Fix: Loading Files Again (AMReX-Codes#2936) This enables that `amrex::ParmParse::addfile` can be called multiple times. Before this, we accidentially overwrite the `FILE` static keyword. Follow-up to AMReX-Codes#2842 commit 8f8198c Author: hengjiew <[email protected]> Date: Tue Sep 6 13:36:35 2022 -0400 Check if boundary particles container has been created before clearance. (AMReX-Codes#2935) This fixes a segmentation fault when using more GPUs for updating particles than fluid. commit fb0b31e Author: Nuno Miguel Nobre <[email protected]> Date: Sun Sep 4 05:18:49 2022 +0100 SYCL: Replace deprecated atomic types and operations (AMReX-Codes#2921) * SYCL: Replace deprecated atomic types and operations * Change atomic refs to device memory scope When using the relaxed memory order, the memory scope is ignored. Thus, for cosmetic reasons only, we set the memory scope to device, the broadest option when using the global address space. Co-authored-by: Weiqun Zhang <[email protected]> commit cc3cd14 Author: Weiqun Zhang <[email protected]> Date: Thu Sep 1 07:39:25 2022 -0700 Update CHANGES for 22.09 (AMReX-Codes#2934) commit acc223f Author: Weiqun Zhang <[email protected]> Date: Tue Aug 30 16:04:43 2022 -0700 Add hypre as an option for OpenBCSolver (AMReX-Codes#2931) commit 3d29fd7 Author: hengjiew <[email protected]> Date: Wed Aug 24 16:10:22 2022 -0400 Preserve neighbor particles when sorting particles. (AMReX-Codes#2923) commit 8294c3a Author: Weiqun Zhang <[email protected]> Date: Mon Aug 22 10:46:05 2022 -0700 Scope of NonLocalBC::ParallelCopy (AMReX-Codes#2922) Make NonLocalBC::ParallelCopy accessible in namespace amrex, because it can be useful in situations other than non-local BC. commit 0911fc4 Author: Weiqun Zhang <[email protected]> Date: Sun Aug 21 18:13:07 2022 -0700 Open Boundary Poisson Solver (AMReX-Codes#2912) This adds an open boundary Poisson solver based on the James's algorithm. To use it, the user builds an amrex:OpenBCSolver object, which can be reused until the grids change, and then call OpenBCSolver::solver. Currently, this is for 3D cell-centered data only. The solver works on CPU, Nvidia GPUS, and AMD GPUs. The SYCL version of a couple of kernels for Intel GPUs are to be implemented. commit f270b3d Author: Marc T. Henry de Frahan <[email protected]> Date: Thu Aug 18 13:51:56 2022 -0600 Fix OOB access of ref ratio on HDF write header (AMReX-Codes#2919) commit fa8e20f Author: Jean M. Sexton <[email protected]> Date: Thu Aug 18 08:57:51 2022 -0700 Add Polaris to GNUMake (AMReX-Codes#2908) commit bd5f6a9 Author: Axel Huebl <[email protected]> Date: Mon Aug 15 14:24:21 2022 -0700 Export GpuDevice Globals (AMReX-Codes#2918) * Export GpuDevice Globals Implement symbol export via `AMREX_EXPORT` for the global variables in `Src/Base/AMReX_GpuDevice.H`. Follow-up to AMReX-Codes#1847 AMReX-Codes#1847 Fix AMReX-Codes#2917 * Fix: Export `AMReX::m_instance` commit 4f63929 Author: asalmgren <[email protected]> Date: Sat Aug 13 09:00:02 2022 -0700 enable LinOp to use the right Factory (fixes moving geometry problem) (AMReX-Codes#2916) commit 6593518 Author: Andrew Myers <[email protected]> Date: Thu Aug 11 15:24:16 2022 -0700 Use 1 atomic instead of two per item in DenseBins::build (AMReX-Codes#2911) commit d295f22 Author: Nuno Miguel Nobre <[email protected]> Date: Thu Aug 11 03:40:09 2022 +0100 [SYCL] Remove amrex::oneapi and update deprecated device descriptors (AMReX-Codes#2910) * Remove amrex::oneapi in favour of standard features * Change deprecated device descriptors commit 1bda173 Author: Axel Huebl <[email protected]> Date: Wed Aug 10 15:46:43 2022 -0600 Add: `MultiFab::sum_unique` (AMReX-Codes#2909) This provides a new method to sum values in a `MultiFab`. For non-cell-centered data, `MultiFab::sum` double counts box boundary values that are owned by multiple boxes. This provides a function that does not double count these and provides a quick way to get only the sum of physically unique values. Co-authored-by: Weiqun Zhang <[email protected]> commit 3f715d2 Author: Candace Gilet <[email protected]> Date: Mon Aug 8 14:40:28 2022 -0400 In MLMG::mgFcycle, assert that for EB the linop is cell-centered. (AMReX-Codes#2905) commit 59b0742 Author: hengjiew <[email protected]> Date: Mon Aug 8 14:17:57 2022 -0400 Clear the boundary particle indices' container before updating it. (AMReX-Codes#2907) This avoids potential segmentation faults when one grid's particles all move to other grids. commit 103db6e Author: Weiqun Zhang <[email protected]> Date: Fri Aug 5 15:25:33 2022 -0700 EB: Add Fine Levels (AMReX-Codes#2881) Add a new function EB2::addFineLevels() that can be used to add more fine levels to the existing EB IndexSpace without changing the coarse levels. This is useful for restarting with a larger amr.max_level. The issue is we build EB at the finest level first and then coarsen it to the coarse levels. If the restart run has a different finest level, the EB on the coarse levels could be different without using this new capability. commit 6ebf8ff Author: Jon Rood <[email protected]> Date: Thu Aug 4 14:32:59 2022 -0600 Add rpath to lib64 for ZFP. (AMReX-Codes#2902) commit ed23627 Author: Yadong_Zeng <[email protected]> Date: Thu Aug 4 16:32:21 2022 -0400 change data types from double to amrex::Real, and thus we can use single precision for the hypre IJ interface (AMReX-Codes#2896) Co-authored-by: yzeng <[email protected]> commit 9ed4f59 Author: Weiqun Zhang <[email protected]> Date: Wed Aug 3 16:53:20 2022 -0700 Fix a new bug introduced in AMReX-Codes#2858 (AMReX-Codes#2901) We need to take into account that `amrex::Any` stores `MultiFab&` or `MultiFab const&`. commit 6eaab8c Author: Weiqun Zhang <[email protected]> Date: Wed Aug 3 13:39:44 2022 -0700 MPMD Support (AMReX-Codes#2895) Add support for multiple programs multiple data (MPMD). For now, we assume there are only two programs (i.e., executables) in the MPMD mode. During the initialization, MPI_COMM_WORLD is split into two communicators. The MPMD::Copier class can be used to copy FabArray/MultiFab data between two programs. This new capability can be used by FHDeX to couple FHD with SPPARKS. commit 9469329 Author: Weiqun Zhang <[email protected]> Date: Mon Aug 1 09:43:21 2022 -0700 MLMG interface (AMReX-Codes#2858) These changes are made to support a generic type (i.e., amrex::Any) in MLMG. This is still work in progress. But it should not break any existing codes. commit 5a3b303 Author: Weiqun Zhang <[email protected]> Date: Mon Aug 1 09:34:44 2022 -0700 Update CHANGES for 22.08 (AMReX-Codes#2894) commit 48702b4 Author: hengjiew <[email protected]> Date: Thu Jul 28 14:14:19 2022 -0400 Let `selectActualNeighbors` return right after starting if there are (AMReX-Codes#2886) no particles for communication. commit 6a47d89 Author: kngott <[email protected]> Date: Wed Jul 27 17:03:04 2022 -0700 Add Comm Sync to Redistribute (AMReX-Codes#2891) commit 51542c8 Author: philip-blakely <[email protected]> Date: Wed Jul 27 17:29:26 2022 +0100 Multi-materials and derived variable output (AMReX-Codes#2888) ## Summary Output small plots if only derived variables are specified. Also, make DeriveFuncFab a std::function<> instead of plain function-pointer. ## Additional background We have been implementing small-plots for outputing variables at gauges (e.g. pressure at specific gauge locations). We may want to output the derived variable pressure only, and not all state-variables. The if-condition was incorrect in this case. Further, multi-material simulations require a material index in order to compute derived variables, in addition to existing parameters. Making DeriveFuncFab a std::function is sufficient for our purposes. commit ce0fb74 Author: Andrew Myers <[email protected]> Date: Tue Jul 26 16:20:38 2022 -0700 Fix host / device sync bug in PODVector (AMReX-Codes#2890) commit 06753e6 Author: Axel Huebl <[email protected]> Date: Tue Jul 26 12:54:35 2022 -0700 `TagBoxArray::collate`: Fujitsu Clang (AMReX-Codes#2889) `mpiFCC -Nclang` only defines `__CLANG_FUJITSU`, not `__FUJITSU` as in the classic compiler mode. commit 7cf77dc Author: Weiqun Zhang <[email protected]> Date: Tue Jul 26 11:01:21 2022 -0700 MinLoc and MaxLoc Support (AMReX-Codes#2885) Add struct ValLocPair that can be used by ReduceOps/ReduceData and ParReduce to find the location of the min/max value. Add warp shuffle down function for more general types. This is needed for MinLoc/MaxLoc with CUDA < 11, because we don't use CUB for earlier versions of CUDA. The Intel GPU support is not done yet. We need to allocate enough shared local memory when the size of ValLocPair is larger than the size of unsigned long long. commit 4b7e200 Author: Weiqun Zhang <[email protected]> Date: Thu Jul 21 10:25:57 2022 -0700 HIP: Remove the call to hipDeviceSetSharedMemConfig (AMReX-Codes#2884) AMD devices do not support shared cache banking. Thanks @afanfa for reporting this. (AMReX-Codes#2883) commit 8e40952 Author: Weiqun Zhang <[email protected]> Date: Wed Jul 20 12:10:26 2022 -0700 Add Frontier to GNU Make (AMReX-Codes#2879) commit b673d81 Author: Max Katz <[email protected]> Date: Mon Jul 18 15:14:19 2022 -0400 Add option to derefine to AMRErrorTag (AMReX-Codes#2875) This allows a refinement field to specify *derefinement* (by setting a zone's tagging value to the clear value). commit 73dbf2f Author: hengjiew <[email protected]> Date: Mon Jul 18 12:53:35 2022 -0400 Fix the segmentation fault in selecting actual neighbor particles. (AMReX-Codes#2877) commit 40b3d21 Author: Weiqun Zhang <[email protected]> Date: Wed Jul 13 13:24:15 2022 -0700 Add extra braces in initialization of GpuArray (AMReX-Codes#2876) It should not be needed since C++14. But some compilers seem to need the double braces. commit a633d2b Author: Luca Fedeli <[email protected]> Date: Fri Jul 8 20:34:18 2022 +0200 Workaround to bypass issue observed at very large scale with Fujitsu MPI (AMReX-Codes#2874) We have observed some MPI issues at very large scale when WarpX is compiled using Fujitsu MPI (i.e., with the Fujitsu compiler). These issues seem to be related to the use of MPI Gatherv with MPI_Datatype. This PR implements a possible workaround, initially proposed by @WeiqunZhang . The idea is that, when WarpX is compiled with the Fujitsu compiler, simpler integer arrays instead of MPI_Datatype are used in the routine where the issue was observed. commit 7660c88 Author: Weiqun Zhang <[email protected]> Date: Fri Jul 8 08:48:14 2022 -0700 Allow zero components MultiFab and BaseFab (AMReX-Codes#2873) This is useful for particle I/O that does not have any mesh data. yt needs a header file associated with a MultiFab. commit c849dd1 Author: Weiqun Zhang <[email protected]> Date: Fri Jul 8 08:06:37 2022 -0700 New EB optimization parameter: eb2.num_coarsen_opt (AMReX-Codes#2872) At the beginning of EB generation, we chop the entire finest domain into boxes and find out the type of the boxes. We then collect the completely covered boxes and cut boxes into two BoxArrays. This process can be costly because of the number of calls to the implicit functions. In this commit, we have introduced a new ParmParse parameter, eb2.num_coarsen_opt with a default value of zero. If for instance it is set to 3, we start the box type categorization at a resolution that is coarsened by a factor of 2^3. For the provisional cut boxes, we refine them by a factor of 2, Then we chop them into small boxes and categorize the new boxes. This process is performed recursively until we are at the original finest resolution. The users should be aware that, if eb2.num_coaren_opt is too big, this could produce in erroneous results because evaluating the implicit function on coarse boxes could miss fine structures in the EB. Thank Robert Marskar for sharing this algorithm. commit 557aae8 Author: Erik <[email protected]> Date: Wed Jul 6 08:54:24 2022 -0700 point to new location of AMReX images, AMReX website repo (AMReX-Codes#2867) commit cbdc658 Author: Axel Huebl <[email protected]> Date: Tue Jul 5 01:41:03 2022 +0200 SENSEI 4.0: Fix Build for Particles (AMReX-Codes#2869) ## Summary This part causes a compile error now in WarpX. cc @burlen @kwryankrattiger ## Additional background X-ref: Blocks WarpX 22.07 release ECP-WarpX/WarpX#3211 Follow-up to: - AMReX-Codes#2785 - AMReX-Codes#2834 commit dc8b734 Author: Andrew Myers <[email protected]> Date: Fri Jul 1 17:19:20 2022 -0700 Cache the neighbor comm tags for the CPU implementation of fillNeighbors. (AMReX-Codes#2862) * Cache the neighbor comm tags for the CPU implementation of fillNeighbors. * fix areMasksValid function commit 2b42fb5 Author: drangara <[email protected]> Date: Fri Jul 1 18:44:35 2022 -0400 Remove some hard checks in check_mvmc for 3D (AMReX-Codes#2864) Removing some hard checks in 3D coarsening logic as it appears that those are not necessarily bad states, and a soft failure to coarsen should suffice. commit 19c7068 Author: Erik <[email protected]> Date: Fri Jul 1 18:24:24 2022 -0400 Carry over fix for ngbxy.smallEnd typo (AMReX-Codes#2868) This a typo that got correct in other places but didn't get fixed here. commit d736ef2 Author: Weiqun Zhang <[email protected]> Date: Fri Jul 1 11:00:15 2022 -0700 Update CHANGES for 22.07 (AMReX-Codes#2866) commit be813d0 Author: Weiqun Zhang <[email protected]> Date: Fri Jul 1 10:29:13 2022 -0700 Hypre: add version check (AMReX-Codes#2865) These HYPRE_SetSp* are only available in hypre >= 22500. commit 8fb23ec Author: Jon Rood <[email protected]> Date: Wed Jun 29 16:52:35 2022 -0600 Refactor Make.nrel to use MPT for MPI with the Intel compiler on Eagle. (AMReX-Codes#2861) commit 6f9a46c Author: PaulMullowney <[email protected]> Date: Wed Jun 29 11:09:57 2022 -0600 Adding control APIs and namespacing for core algorithm paths like SpGEMM, SpMV, and SpTrans. (AMReX-Codes#2859) Co-authored-by: Paul Mullowney <[email protected]> commit e4c83cf Author: Jon Rood <[email protected]> Date: Wed Jun 29 11:08:42 2022 -0600 Add lib64 library location for ZFP since it may exist there instead of lib. (AMReX-Codes#2860) commit b2b9150 Author: Burlen Loring <[email protected]> Date: Tue Jun 28 13:42:41 2022 -0700 update the SENSEI in situ coupling for SENSEI v4.0.0 (AMReX-Codes#2785) In this release, an install of VTK is no longer required. To compile AMReX w/ SENSEI use: ```cmake -DAMReX_SENSEI=ON -DSENSEI_DIR=<path to SENSEI install>/<lib dir>/cmake ``` Note: <lib dir> may be `lib` or `lib64` or something else depending on your OS and is determined by CMake at configure time. See the CMake GNUInstallDirs documentation for more information. commit 2c5f475 Author: Andrew Myers <[email protected]> Date: Tue Jun 28 12:51:19 2022 -0700 Write runtime attribs to checkpoints on GPUs (AMReX-Codes#2856) commit d2cb546 Author: Jon Rood <[email protected]> Date: Tue Jun 28 13:27:02 2022 -0600 Fix gnu make on Crusher for mpi_gtl_hsa (AMReX-Codes#2857) Update environment variable at OLCF for mpi_gtl_hsa. commit 21fe4b3 Author: Axel Huebl <[email protected]> Date: Tue Jun 28 19:53:09 2022 +0200 CMake: FindDependency CUDAToolkit (AMReX-Codes#2849) If we install AMReX with CUDA support using a modern CMake, we need to repopulate targets such as `CUDA::curand` from `find_dependency` for downstream. Downstream users find us via `find_package` and that target link dependency showed up to be unpopulated in MFIX. commit 027f2ff Author: Weiqun Zhang <[email protected]> Date: Thu Jun 23 16:15:57 2022 -0700 Fix make help (AMReX-Codes#2854) This reverts the change in AMReX-Codes#2845, which fixed an issue with `make print-%`, but broke `make help`. This is now fixed in a different way. Both `make print-%` and `make help` should work now. commit 3d3ad21 Author: kngott <[email protected]> Date: Thu Jun 23 13:39:59 2022 -0700 NERSC Programming Environment prototype (AMReX-Codes#2848) commit 4872676 Author: Weiqun Zhang <[email protected]> Date: Thu Jun 23 12:41:20 2022 -0700 GNU Make: No need to query mpif90 if Fortran is not used. (AMReX-Codes#2852) This minimize potential issues. commit fc0d646 Author: Weiqun Zhang <[email protected]> Date: Thu Jun 23 12:23:55 2022 -0700 Remove f90doc (AMReX-Codes#2851) We no longer use it. commit 5188a6a Author: Weiqun Zhang <[email protected]> Date: Thu Jun 23 11:09:15 2022 -0700 Explicitly invoke python3 (AMReX-Codes#2850) According to PEP 394, a python distributor may choose to not provide the python command. In fact, that's what recent versions of macOS do. commit 2d931f6 Author: Andrew Myers <[email protected]> Date: Wed Jun 22 15:03:50 2022 -0500 Maintain the high end of the 'roundoff domain' in both float and double precision (AMReX-Codes#2839) * Maintain the high end of the 'roundoff domain' in both float and double precision * fix shadowing * fix warning * fix float conversion warning * fix logic * Update Src/Base/AMReX_Geometry.H * Update Src/Base/AMReX_Geometry.H

nmnobre · 2022-11-11T18:12:38Z

Hi @WeiqunZhang,

Thank you for this and for your work in #3024.

This change relies on CL/sycl.hpp to either:

inline the cl namespace thereby exposing the sycl namespace, as in current intel compiler versions;
expose the sycl namespace, as in future intel compiler versions.

The latter is unfortunate as it is non-standard. We should be including sycl/sycl.hpp which, as you know, has never been a part of Intel's production/consumer compilers (as opposed to their open-source version). I suppose AMReX never set out to offer pure sycl compatibility, it was always about intel's dpc++ flavour, so I'm guessing this change was made with that mindset?

Cheers,
-Nuno

WeiqunZhang · 2022-11-11T18:19:45Z

@nmnobre What SYCL compiler are you using? If possible, we could add a CI test for it.

nmnobre · 2022-11-11T18:28:45Z

I'm using both hipSYCL and intel's open-source dpc++ which both pack sycl/sycl.hpp.
I should probably say I'm not sure what the right choice here is.
If it were up to me I'd ask intel to just add sycl/sycl.hpp, but I'm not sure where I'd ask for that, and I'm sure they have their reasons... unless you have insider knowledge they'll be doing that starting with version 2023.1 :P
I'm happy with continuing with my own fork with the patches I need in AMReX for my use case, it's probably wiser for AMReX to support the production-ready compilers?...

WeiqunZhang · 2022-11-11T18:35:40Z

Could you let me know what macros does the opensouce dpc++ compiler define? dpcpp -dM -E - < /dev/null

nmnobre · 2022-11-11T18:38:42Z

clang++ -dM -E - < /dev/null gives:

#define _LP64 1
#define __ATOMIC_ACQUIRE 2
#define __ATOMIC_ACQ_REL 4
#define __ATOMIC_CONSUME 1
#define __ATOMIC_RELAXED 0
#define __ATOMIC_RELEASE 3
#define __ATOMIC_SEQ_CST 5
#define __BIGGEST_ALIGNMENT__ 16
#define __BITINT_MAXWIDTH__ 128
#define __BOOL_WIDTH__ 8
#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __CHAR16_TYPE__ unsigned short
#define __CHAR32_TYPE__ unsigned int
#define __CHAR_BIT__ 8
#define __CLANG_ATOMIC_BOOL_LOCK_FREE 2
#define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2
#define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2
#define __CLANG_ATOMIC_CHAR_LOCK_FREE 2
#define __CLANG_ATOMIC_INT_LOCK_FREE 2
#define __CLANG_ATOMIC_LLONG_LOCK_FREE 2
#define __CLANG_ATOMIC_LONG_LOCK_FREE 2
#define __CLANG_ATOMIC_POINTER_LOCK_FREE 2
#define __CLANG_ATOMIC_SHORT_LOCK_FREE 2
#define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __CONSTANT_CFSTRINGS__ 1
#define __DBL_DECIMAL_DIG__ 17
#define __DBL_DENORM_MIN__ 4.9406564584124654e-324
#define __DBL_DIG__ 15
#define __DBL_EPSILON__ 2.2204460492503131e-16
#define __DBL_HAS_DENORM__ 1
#define __DBL_HAS_INFINITY__ 1
#define __DBL_HAS_QUIET_NAN__ 1
#define __DBL_MANT_DIG__ 53
#define __DBL_MAX_10_EXP__ 308
#define __DBL_MAX_EXP__ 1024
#define __DBL_MAX__ 1.7976931348623157e+308
#define __DBL_MIN_10_EXP__ (-307)
#define __DBL_MIN_EXP__ (-1021)
#define __DBL_MIN__ 2.2250738585072014e-308
#define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__
#define __ELF__ 1
#define __FINITE_MATH_ONLY__ 0
#define __FLOAT128__ 1
#define __FLT16_DECIMAL_DIG__ 5
#define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16
#define __FLT16_DIG__ 3
#define __FLT16_EPSILON__ 9.765625e-4F16
#define __FLT16_HAS_DENORM__ 1
#define __FLT16_HAS_INFINITY__ 1
#define __FLT16_HAS_QUIET_NAN__ 1
#define __FLT16_MANT_DIG__ 11
#define __FLT16_MAX_10_EXP__ 4
#define __FLT16_MAX_EXP__ 16
#define __FLT16_MAX__ 6.5504e+4F16
#define __FLT16_MIN_10_EXP__ (-4)
#define __FLT16_MIN_EXP__ (-13)
#define __FLT16_MIN__ 6.103515625e-5F16
#define __FLT_DECIMAL_DIG__ 9
#define __FLT_DENORM_MIN__ 1.40129846e-45F
#define __FLT_DIG__ 6
#define __FLT_EPSILON__ 1.19209290e-7F
#define __FLT_HAS_DENORM__ 1
#define __FLT_HAS_INFINITY__ 1
#define __FLT_HAS_QUIET_NAN__ 1
#define __FLT_MANT_DIG__ 24
#define __FLT_MAX_10_EXP__ 38
#define __FLT_MAX_EXP__ 128
#define __FLT_MAX__ 3.40282347e+38F
#define __FLT_MIN_10_EXP__ (-37)
#define __FLT_MIN_EXP__ (-125)
#define __FLT_MIN__ 1.17549435e-38F
#define __FLT_RADIX__ 2
#define __FXSR__ 1
#define __GCC_ASM_FLAG_OUTPUTS__ 1
#define __GCC_ATOMIC_BOOL_LOCK_FREE 2
#define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2
#define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2
#define __GCC_ATOMIC_CHAR_LOCK_FREE 2
#define __GCC_ATOMIC_INT_LOCK_FREE 2
#define __GCC_ATOMIC_LLONG_LOCK_FREE 2
#define __GCC_ATOMIC_LONG_LOCK_FREE 2
#define __GCC_ATOMIC_POINTER_LOCK_FREE 2
#define __GCC_ATOMIC_SHORT_LOCK_FREE 2
#define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __GCC_HAVE_DWARF2_CFI_ASM 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1
#define __GNUC_MINOR__ 2
#define __GNUC_PATCHLEVEL__ 1
#define __GNUC_STDC_INLINE__ 1
#define __GNUC__ 4
#define __GXX_ABI_VERSION 1002
#define __INT16_C_SUFFIX__ 
#define __INT16_FMTd__ "hd"
#define __INT16_FMTi__ "hi"
#define __INT16_MAX__ 32767
#define __INT16_TYPE__ short
#define __INT32_C_SUFFIX__ 
#define __INT32_FMTd__ "d"
#define __INT32_FMTi__ "i"
#define __INT32_MAX__ 2147483647
#define __INT32_TYPE__ int
#define __INT64_C_SUFFIX__ L
#define __INT64_FMTd__ "ld"
#define __INT64_FMTi__ "li"
#define __INT64_MAX__ 9223372036854775807L
#define __INT64_TYPE__ long int
#define __INT8_C_SUFFIX__ 
#define __INT8_FMTd__ "hhd"
#define __INT8_FMTi__ "hhi"
#define __INT8_MAX__ 127
#define __INT8_TYPE__ signed char
#define __INTMAX_C_SUFFIX__ L
#define __INTMAX_FMTd__ "ld"
#define __INTMAX_FMTi__ "li"
#define __INTMAX_MAX__ 9223372036854775807L
#define __INTMAX_TYPE__ long int
#define __INTMAX_WIDTH__ 64
#define __INTPTR_FMTd__ "ld"
#define __INTPTR_FMTi__ "li"
#define __INTPTR_MAX__ 9223372036854775807L
#define __INTPTR_TYPE__ long int
#define __INTPTR_WIDTH__ 64
#define __INT_FAST16_FMTd__ "hd"
#define __INT_FAST16_FMTi__ "hi"
#define __INT_FAST16_MAX__ 32767
#define __INT_FAST16_TYPE__ short
#define __INT_FAST16_WIDTH__ 16
#define __INT_FAST32_FMTd__ "d"
#define __INT_FAST32_FMTi__ "i"
#define __INT_FAST32_MAX__ 2147483647
#define __INT_FAST32_TYPE__ int
#define __INT_FAST32_WIDTH__ 32
#define __INT_FAST64_FMTd__ "ld"
#define __INT_FAST64_FMTi__ "li"
#define __INT_FAST64_MAX__ 9223372036854775807L
#define __INT_FAST64_TYPE__ long int
#define __INT_FAST64_WIDTH__ 64
#define __INT_FAST8_FMTd__ "hhd"
#define __INT_FAST8_FMTi__ "hhi"
#define __INT_FAST8_MAX__ 127
#define __INT_FAST8_TYPE__ signed char
#define __INT_FAST8_WIDTH__ 8
#define __INT_LEAST16_FMTd__ "hd"
#define __INT_LEAST16_FMTi__ "hi"
#define __INT_LEAST16_MAX__ 32767
#define __INT_LEAST16_TYPE__ short
#define __INT_LEAST16_WIDTH__ 16
#define __INT_LEAST32_FMTd__ "d"
#define __INT_LEAST32_FMTi__ "i"
#define __INT_LEAST32_MAX__ 2147483647
#define __INT_LEAST32_TYPE__ int
#define __INT_LEAST32_WIDTH__ 32
#define __INT_LEAST64_FMTd__ "ld"
#define __INT_LEAST64_FMTi__ "li"
#define __INT_LEAST64_MAX__ 9223372036854775807L
#define __INT_LEAST64_TYPE__ long int
#define __INT_LEAST64_WIDTH__ 64
#define __INT_LEAST8_FMTd__ "hhd"
#define __INT_LEAST8_FMTi__ "hhi"
#define __INT_LEAST8_MAX__ 127
#define __INT_LEAST8_TYPE__ signed char
#define __INT_LEAST8_WIDTH__ 8
#define __INT_MAX__ 2147483647
#define __INT_WIDTH__ 32
#define __LDBL_DECIMAL_DIG__ 21
#define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L
#define __LDBL_DIG__ 18
#define __LDBL_EPSILON__ 1.08420217248550443401e-19L
#define __LDBL_HAS_DENORM__ 1
#define __LDBL_HAS_INFINITY__ 1
#define __LDBL_HAS_QUIET_NAN__ 1
#define __LDBL_MANT_DIG__ 64
#define __LDBL_MAX_10_EXP__ 4932
#define __LDBL_MAX_EXP__ 16384
#define __LDBL_MAX__ 1.18973149535723176502e+4932L
#define __LDBL_MIN_10_EXP__ (-4931)
#define __LDBL_MIN_EXP__ (-16381)
#define __LDBL_MIN__ 3.36210314311209350626e-4932L
#define __LITTLE_ENDIAN__ 1
#define __LLONG_WIDTH__ 64
#define __LONG_LONG_MAX__ 9223372036854775807LL
#define __LONG_MAX__ 9223372036854775807L
#define __LONG_WIDTH__ 64
#define __LP64__ 1
#define __MMX__ 1
#define __NO_INLINE__ 1
#define __NO_MATH_INLINES 1
#define __OBJC_BOOL_IS_BOOL 0
#define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3
#define __OPENCL_MEMORY_SCOPE_DEVICE 2
#define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4
#define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1
#define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0
#define __ORDER_BIG_ENDIAN__ 4321
#define __ORDER_LITTLE_ENDIAN__ 1234
#define __ORDER_PDP_ENDIAN__ 3412
#define __POINTER_WIDTH__ 64
#define __PRAGMA_REDEFINE_EXTNAME 1
#define __PTRDIFF_FMTd__ "ld"
#define __PTRDIFF_FMTi__ "li"
#define __PTRDIFF_MAX__ 9223372036854775807L
#define __PTRDIFF_TYPE__ long int
#define __PTRDIFF_WIDTH__ 64
#define __REGISTER_PREFIX__ 
#define __SCHAR_MAX__ 127
#define __SEG_FS 1
#define __SEG_GS 1
#define __SHRT_MAX__ 32767
#define __SHRT_WIDTH__ 16
#define __SIG_ATOMIC_MAX__ 2147483647
#define __SIG_ATOMIC_WIDTH__ 32
#define __SIZEOF_DOUBLE__ 8
#define __SIZEOF_FLOAT128__ 16
#define __SIZEOF_FLOAT__ 4
#define __SIZEOF_INT128__ 16
#define __SIZEOF_INT__ 4
#define __SIZEOF_LONG_DOUBLE__ 16
#define __SIZEOF_LONG_LONG__ 8
#define __SIZEOF_LONG__ 8
#define __SIZEOF_POINTER__ 8
#define __SIZEOF_PTRDIFF_T__ 8
#define __SIZEOF_SHORT__ 2
#define __SIZEOF_SIZE_T__ 8
#define __SIZEOF_WCHAR_T__ 4
#define __SIZEOF_WINT_T__ 4
#define __SIZE_FMTX__ "lX"
#define __SIZE_FMTo__ "lo"
#define __SIZE_FMTu__ "lu"
#define __SIZE_FMTx__ "lx"
#define __SIZE_MAX__ 18446744073709551615UL
#define __SIZE_TYPE__ long unsigned int
#define __SIZE_WIDTH__ 64
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __STDC_HOSTED__ 1
#define __STDC_UTF_16__ 1
#define __STDC_UTF_32__ 1
#define __STDC_VERSION__ 201710L
#define __STDC__ 1
#define __UINT16_C_SUFFIX__ 
#define __UINT16_FMTX__ "hX"
#define __UINT16_FMTo__ "ho"
#define __UINT16_FMTu__ "hu"
#define __UINT16_FMTx__ "hx"
#define __UINT16_MAX__ 65535
#define __UINT16_TYPE__ unsigned short
#define __UINT32_C_SUFFIX__ U
#define __UINT32_FMTX__ "X"
#define __UINT32_FMTo__ "o"
#define __UINT32_FMTu__ "u"
#define __UINT32_FMTx__ "x"
#define __UINT32_MAX__ 4294967295U
#define __UINT32_TYPE__ unsigned int
#define __UINT64_C_SUFFIX__ UL
#define __UINT64_FMTX__ "lX"
#define __UINT64_FMTo__ "lo"
#define __UINT64_FMTu__ "lu"
#define __UINT64_FMTx__ "lx"
#define __UINT64_MAX__ 18446744073709551615UL
#define __UINT64_TYPE__ long unsigned int
#define __UINT8_C_SUFFIX__ 
#define __UINT8_FMTX__ "hhX"
#define __UINT8_FMTo__ "hho"
#define __UINT8_FMTu__ "hhu"
#define __UINT8_FMTx__ "hhx"
#define __UINT8_MAX__ 255
#define __UINT8_TYPE__ unsigned char
#define __UINTMAX_C_SUFFIX__ UL
#define __UINTMAX_FMTX__ "lX"
#define __UINTMAX_FMTo__ "lo"
#define __UINTMAX_FMTu__ "lu"
#define __UINTMAX_FMTx__ "lx"
#define __UINTMAX_MAX__ 18446744073709551615UL
#define __UINTMAX_TYPE__ long unsigned int
#define __UINTMAX_WIDTH__ 64
#define __UINTPTR_FMTX__ "lX"
#define __UINTPTR_FMTo__ "lo"
#define __UINTPTR_FMTu__ "lu"
#define __UINTPTR_FMTx__ "lx"
#define __UINTPTR_MAX__ 18446744073709551615UL
#define __UINTPTR_TYPE__ long unsigned int
#define __UINTPTR_WIDTH__ 64
#define __UINT_FAST16_FMTX__ "hX"
#define __UINT_FAST16_FMTo__ "ho"
#define __UINT_FAST16_FMTu__ "hu"
#define __UINT_FAST16_FMTx__ "hx"
#define __UINT_FAST16_MAX__ 65535
#define __UINT_FAST16_TYPE__ unsigned short
#define __UINT_FAST32_FMTX__ "X"
#define __UINT_FAST32_FMTo__ "o"
#define __UINT_FAST32_FMTu__ "u"
#define __UINT_FAST32_FMTx__ "x"
#define __UINT_FAST32_MAX__ 4294967295U
#define __UINT_FAST32_TYPE__ unsigned int
#define __UINT_FAST64_FMTX__ "lX"
#define __UINT_FAST64_FMTo__ "lo"
#define __UINT_FAST64_FMTu__ "lu"
#define __UINT_FAST64_FMTx__ "lx"
#define __UINT_FAST64_MAX__ 18446744073709551615UL
#define __UINT_FAST64_TYPE__ long unsigned int
#define __UINT_FAST8_FMTX__ "hhX"
#define __UINT_FAST8_FMTo__ "hho"
#define __UINT_FAST8_FMTu__ "hhu"
#define __UINT_FAST8_FMTx__ "hhx"
#define __UINT_FAST8_MAX__ 255
#define __UINT_FAST8_TYPE__ unsigned char
#define __UINT_LEAST16_FMTX__ "hX"
#define __UINT_LEAST16_FMTo__ "ho"
#define __UINT_LEAST16_FMTu__ "hu"
#define __UINT_LEAST16_FMTx__ "hx"
#define __UINT_LEAST16_MAX__ 65535
#define __UINT_LEAST16_TYPE__ unsigned short
#define __UINT_LEAST32_FMTX__ "X"
#define __UINT_LEAST32_FMTo__ "o"
#define __UINT_LEAST32_FMTu__ "u"
#define __UINT_LEAST32_FMTx__ "x"
#define __UINT_LEAST32_MAX__ 4294967295U
#define __UINT_LEAST32_TYPE__ unsigned int
#define __UINT_LEAST64_FMTX__ "lX"
#define __UINT_LEAST64_FMTo__ "lo"
#define __UINT_LEAST64_FMTu__ "lu"
#define __UINT_LEAST64_FMTx__ "lx"
#define __UINT_LEAST64_MAX__ 18446744073709551615UL
#define __UINT_LEAST64_TYPE__ long unsigned int
#define __UINT_LEAST8_FMTX__ "hhX"
#define __UINT_LEAST8_FMTo__ "hho"
#define __UINT_LEAST8_FMTu__ "hhu"
#define __UINT_LEAST8_FMTx__ "hhx"
#define __UINT_LEAST8_MAX__ 255
#define __UINT_LEAST8_TYPE__ unsigned char
#define __USER_LABEL_PREFIX__ 
#define __VERSION__ "Clang 16.0.0"
#define __WCHAR_MAX__ 2147483647
#define __WCHAR_TYPE__ int
#define __WCHAR_WIDTH__ 32
#define __WINT_MAX__ 4294967295U
#define __WINT_TYPE__ unsigned int
#define __WINT_UNSIGNED__ 1
#define __WINT_WIDTH__ 32
#define __amd64 1
#define __amd64__ 1
#define __clang__ 1
#define __clang_literal_encoding__ "UTF-8"
#define __clang_major__ 16
#define __clang_minor__ 0
#define __clang_patchlevel__ 0
#define __clang_version__ "16.0.0 "
#define __clang_wide_literal_encoding__ "UTF-32"
#define __code_model_small__ 1
#define __gnu_linux__ 1
#define __k8 1
#define __k8__ 1
#define __linux 1
#define __linux__ 1
#define __llvm__ 1
#define __seg_fs __attribute__((address_space(257)))
#define __seg_gs __attribute__((address_space(256)))
#define __tune_k8__ 1
#define __unix 1
#define __unix__ 1
#define __x86_64 1
#define __x86_64__ 1
#define linux 1
#define unix 1

WeiqunZhang · 2022-11-11T18:42:21Z

OK. We should be able to use __INTEL_LLVM_COMPILER to detect Intel's oneAPI compiler.

nmnobre · 2022-11-11T18:46:53Z

Indeed. But honestly, you are too kind, I don't want you to clutter the code if this is something I'd be the sole user of.
We could wait for 2023.1 (is that the next version?) and see if they include sycl/sycl.hpp, at which point it's a trivial patch for us.

WeiqunZhang · 2022-11-11T18:55:49Z

Thank you for your contribution! I will add a commit to #3024. Both CL/sycl.hpp and sycl/sycl.hpp work in the latest Intel compiler. When I get time, I will try to set up a CI with hipSYCL.

Remove sycl namespace alias

c888271

This causes a conflict with new compilers.

WeiqunZhang requested review from jmsexton03 and kngott October 3, 2022 16:55

jmsexton03 approved these changes Oct 3, 2022

View reviewed changes

WeiqunZhang merged commit 1bc4e4e into AMReX-Codes:development Oct 3, 2022

WeiqunZhang deleted the namespace_sycl branch October 3, 2022 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove sycl namespace alias #2971

Remove sycl namespace alias #2971

WeiqunZhang commented Oct 3, 2022

nmnobre commented Nov 11, 2022 •

edited

Loading

WeiqunZhang commented Nov 11, 2022

nmnobre commented Nov 11, 2022

WeiqunZhang commented Nov 11, 2022

nmnobre commented Nov 11, 2022

WeiqunZhang commented Nov 11, 2022

nmnobre commented Nov 11, 2022

WeiqunZhang commented Nov 11, 2022

Remove sycl namespace alias #2971

Remove sycl namespace alias #2971

Conversation

WeiqunZhang commented Oct 3, 2022

nmnobre commented Nov 11, 2022 • edited Loading

WeiqunZhang commented Nov 11, 2022

nmnobre commented Nov 11, 2022

WeiqunZhang commented Nov 11, 2022

nmnobre commented Nov 11, 2022

WeiqunZhang commented Nov 11, 2022

nmnobre commented Nov 11, 2022

WeiqunZhang commented Nov 11, 2022

nmnobre commented Nov 11, 2022 •

edited

Loading