Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix EB data inconsistency when fixing small cells and multiple cuts #2943

Merged
merged 1 commit into from
Oct 14, 2022

Conversation

WeiqunZhang
Copy link
Member

Summary

For consistency, we need to call the function that zeros out the level set even if that box does not have any small cells or multiple cuts. This is because a node could exist in multiple boxes. Furthermore, a covered cell or covered face may have a node with a level set < 0.

Additional background

This is usually not an issue. However, in WarpX, we use the level set to decide whether a node is an unknown in the linear system. The inconsistency makes the solver fail in some cases.

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

For consistency, we need to call the function that zeros out the level set
even if that box does not have any small cells or multiple cuts.  This is
because a node could exist in multiple boxes.  Furthermore, a covered cell
or covered face may have a node with a level set < 0.
@WeiqunZhang
Copy link
Member Author

@drangara

@drangara
Copy link
Contributor

drangara commented Sep 9, 2022

LGTM

@WeiqunZhang WeiqunZhang enabled auto-merge (squash) October 13, 2022 17:59
@WeiqunZhang WeiqunZhang merged commit 975b830 into AMReX-Codes:development Oct 14, 2022
@WeiqunZhang WeiqunZhang deleted the fix_eb_levset branch November 2, 2022 00:54
atmyers added a commit to Thierry992/amrex that referenced this pull request Nov 2, 2022
commit 10e99fb
Merge: d03045d f1e1d6f
Author: Andrew Myers <[email protected]>
Date:   Wed Nov 2 14:06:00 2022 -0700

    Merge branch 'particle_soa_refactor' of github.com:Thierry992/amrex into HEAD

commit d03045d
Author: Andrew Myers <[email protected]>
Date:   Wed Nov 2 14:04:23 2022 -0700

    fix buffer pack / unpack

commit d771fc8
Author: Andrew Myers <[email protected]>
Date:   Wed Nov 2 14:04:08 2022 -0700

    revert to one int for each id for now

commit f1e1d6f
Merge: 4dbfbac c4a4811
Author: Axel Huebl <[email protected]>
Date:   Tue Nov 1 15:18:54 2022 -0500

    Merge remote-tracking branch 'mainline/development' into particle_soa_refactor

commit c4a4811
Author: Axel Huebl <[email protected]>
Date:   Tue Nov 1 14:08:38 2022 -0500

    C++17 Transition (AMReX-Codes#2992)

    ## Summary

    Update AMReX to require C++17 or newer.

    - [x] docs
    - [x] CMake
    - [x] GNUmake
    - [x] CI

    ## Additional background

    Requires a mature [C++17](https://en.wikipedia.org/wiki/C%2B%2B17)
    compiler, e.g., GCC 8, Clang 7, NVCC 11.0, MSVC 19.15 or newer.

    Already used since 1+ year in production by downstream codes such as
    Castro and WarpX. Needed for modernization and new features such as
    AMReX-Codes#2878

    Co-authored-by: Weiqun Zhang <[email protected]>

commit d2b8293
Author: Weiqun Zhang <[email protected]>
Date:   Tue Nov 1 09:01:54 2022 -0700

    Update CHANGES for 22.11 (AMReX-Codes#3006)

commit 5ec270b
Author: Weiqun Zhang <[email protected]>
Date:   Tue Nov 1 08:59:44 2022 -0700

    Fix compilation for PETSc (AMReX-Codes#3005)

    We cannot include PETSc headers too early because it might redefine MPI
    routines as macros
    (https://github.com/petsc/petsc/blob/main/include/petsclog.h#L441). They
    break MPI calls like below,

        MPI_Allreduce(&tmp, &vi, 1,
                      ParallelDescriptor::Mpi_typemap<T>::type(),
    ParallelDescriptor::Mpi_op<T,amrex::Greater<T>>(), comm);

    because of the `,` in `<T,amrex::Greater<T>>`.

commit 735c351
Author: Weiqun Zhang <[email protected]>
Date:   Sat Oct 29 10:57:23 2022 -0700

    MPI Reduce for ValLocPair (AMReX-Codes#3003)

    Add ParallelReduce::Min, ParallelReduce::Max, ParallelAllReduce::Min,
    and ParallelAllReduce::Max for ValLocPair<TV,TI>, where TV and TI are
    types that have corresponding MPI types (e.g., int, Real, IntVect, Box,
    etc.).

commit 3ec0768
Author: Axel Huebl <[email protected]>
Date:   Wed Oct 26 16:49:40 2022 -0700

    `FabArray::isDefined` (AMReX-Codes#2997)

    ## Summary

    Add a new query to `define_function_called`.

    ## Additional background

    This is a cheaper check than `ok()` for finding out if a MultiFab has
    been allocated or not yet, assuming that the calling code follows the
    convention that `define()` is called collectively.

    Update: It turns out you can also call `empty` inherited from
    `FabArrayBase`. The new API is quite explicit, which is ok, too.

    Co-authored-by: Weiqun Zhang <[email protected]>

commit 7f3c908
Author: Weiqun Zhang <[email protected]>
Date:   Wed Oct 26 16:40:16 2022 -0700

    Make The_Device_Arena non-managed (AMReX-Codes#2998)

    The_Device_Arena used to be a separate Arena. We changed it to be an
    alias of The_Arena to avoid memory fragmentation. However, the issue is
    we don't have an Arena that can allocate non-managed memory unless
    The_Arena is not managed. Because of performance concerns, we sometimes
    want to allocate non-managed memory. Therefore, we make The_Device_Arena
    an alias if and only if The_Arena is not managed.

commit ab8c892
Author: Weiqun Zhang <[email protected]>
Date:   Wed Oct 26 15:59:39 2022 -0700

    Add alias template Gpu::NonManagedDeviceVector (AMReX-Codes#2999)

commit b3e0a62
Author: Weiqun Zhang <[email protected]>
Date:   Wed Oct 26 15:02:13 2022 -0700

    Pre- and Post-interpolation hook interface (AMReX-Codes#2991)

    Support both Fab and MultiFab versions of pre- and post-interpolation
    hooks.

    Because the pre-interp hook might modify the data, we need to make a copy to
    avoid modifying cached coarse data.

    Close AMReX-Codes#2989.

commit 3082028
Author: Weiqun Zhang <[email protected]>
Date:   Wed Oct 19 19:24:10 2022 -0700

    Update GitHub Actions (AMReX-Codes#2996)

    https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/

    ## Summary

    ## Additional background

    ## Checklist

    The proposed changes:
    - [ ] fix a bug or incorrect behavior in AMReX
    - [ ] add new capabilities to AMReX
    - [ ] changes answers in the test suite to more than roundoff level
    - [ ] are likely to significantly affect the results of downstream AMReX
    users
    - [ ] include documentation in the code and/or rst files, if appropriate

commit 0b88bfd
Author: Weiqun Zhang <[email protected]>
Date:   Wed Oct 19 13:39:18 2022 -0700

    Add user defined BC types (AMReX-Codes#2995)

    Add BCType::user_1, BCType::user_2 and BCType::user_3. Previously the
    only "user" type is ext_dir (external Dirichlet). The BC types are
    passed from the user's code to FillPatch, which in turn passes them back
    to the user provided BC filling function. These new types will make it
    easy for the user to determine the user defined BC types in their BC
    filling functions.

commit 9502b99
Author: Weiqun Zhang <[email protected]>
Date:   Tue Oct 18 10:20:06 2022 -0700

    Add BCRec::set for convenience (AMReX-Codes#2993)

commit 4dbfbac
Author: Thierry Antoun <[email protected]>
Date:   Mon Oct 17 15:05:54 2022 -0700

    Adding AMReX_RESTRICT for GPU Test

commit 7051a6c
Author: Thierry Antoun <[email protected]>
Date:   Mon Oct 17 15:03:19 2022 -0700

    Modyfing RedistributeMPI to make it work with 2 ranks

commit 56b6402
Author: Weiqun Zhang <[email protected]>
Date:   Sat Oct 15 14:59:38 2022 -0700

    ParallelFor with compile time optimization of kernels with run time parameters (AMReX-Codes#2954)

    Branches inside ParallelFor can be very expensive. If a branch uses a
    lot of resources (e.g., registers), it can significantly affect the
    performance even if at run time the branch is never executed because it
    affects the GPU occupancy. For CPUs, it can affect vectorization of the
    kernel.

    The new ParallelFor functions use C++17 fold expression to generate
    kernel launches for all run time variants. Only one will be executed.
    Which one is chosen at run time depends the run time parameters. The
    kernel function can use constexpr if to discard unused code blocks for
    better run time performance. Here are two examples of how to use them.

        int runtime_option = ...;
        enum All_options : int { A0, A1, A2, A3};
        // Four ParallelFors will be generated.
    ParallelFor(TypeList<CompileTimeOptions<A0,A1,A2,A3>>{},
    {runtime_option},
    box, [=] AMREX_GPU_DEVICE (int i, int j, int k, auto control)
        {
            ...
            if constexpr (control.value == A0) {
                ...
            } else if constexpr (control.value == A1) {
                ...
            } else if constexpr (control.value == A2) {
                ...
            else {
                ...
            }
            ...
        });

    and

        int A_runtime_option = ...;
        int B_runtime_option = ...;
        enum A_options : int { A0, A1, A2, A3};
        enum B_options : int { B0, B1 };
        // 4*2=8 ParallelFors will be generated.
        ParallelFor(TypeList<CompileTimeOptions<A0,A1,A2,A3>,
                             CompileTimeOptions<B0,B1> > {},
                    {A_runtime_option, B_runtime_option},
    N, [=] AMREX_GPU_DEVICE (int i, auto A_control, auto B_control)
        {
            ...
            if constexpr (A_control.value == A0) {
                ...
            } else if constexpr (A_control.value == A1) {
                ...
            } else if constexpr (A_control.value == A2) {
                ...
            else {
                ...
            }
            if constexpr (A_control.value != A3 && B_control.value == B1) {
                ...
            }
            ...
        });

    Note that that due to a limitation of CUDA's extended device lambda, the
    constexpr if block cannot be the one that captures a variable first. If
    nvcc complains about it, you will have to manually capture it outside
    constexpr if. The data type for the parameters is int.

    Thank Maikel Nadolski and Alex Sinn for showing us the meta-programming
    techniques used here.

commit bcbf17f
Author: Weiqun Zhang <[email protected]>
Date:   Fri Oct 14 19:48:14 2022 -0700

    2D RZ solver for WarpX: Arbitrary coefficient (AMReX-Codes#2986)

    The assumption in the 2D RZ solver for WarpX used to be there was no
    sigma_r (i.e., sigma_r == 1). In this PR, we allow arbitrary sigma_r
    coefficient.

commit 9a3cd5d
Author: Axel Huebl <[email protected]>
Date:   Fri Oct 14 17:27:41 2022 -0700

    CMake Docs: Fix User-Guidance (Link) (AMReX-Codes#2990)

    Update the user-guidance on CMake dependency linking to CMake 3.0+
    (anno. 2014+).

    Seen in AMReX-Codes#2978

commit 1ad4144
Author: Weiqun Zhang <[email protected]>
Date:   Fri Oct 14 10:36:17 2022 -0700

    Runge-Kutta support for AMR (AMReX-Codes#2974)

    This adds RK2, RK3 and RK4 in a new namespace RungeKutta. Together with
    the enhanced FillPatcher class, these functions can be used for RK time
    stepping in AMR simulations. A new function AmrLevel::RK is added for
    AmrLevel based codes. See CNS::advance in Tests/GPU/CNS/CNS_advance.cpp
    for an example of using the new AmrLevel::RK function.

    The main motivation for this PR is that ghost cell filling for high
    order (> 2) RK methods at coarse/fine boundary is non-trivial when there
    is subcycling.

    Co-authored-by: Jean M. Sexton <[email protected]>

commit c841ae8
Author: Weiqun Zhang <[email protected]>
Date:   Fri Oct 14 10:03:34 2022 -0700

    Fourth-order interpolation from fine to coarse level (AMReX-Codes#2987)

    For fourth-order finite-difference methods with data at cell centers, we
    cannot use the usual averageDown function to overwrite coarse level data
    with fine data. We actually need to do interpolation.

commit 975b830
Author: Weiqun Zhang <[email protected]>
Date:   Fri Oct 14 09:53:22 2022 -0700

    Fix EB data inconsistency when fixing small cells and multiple cuts (AMReX-Codes#2943)

    ## Summary

    For consistency, we need to call the function that zeros out the level
    set even if that box does not have any small cells or multiple cuts.
    This is because a node could exist in multiple boxes. Furthermore, a
    covered cell or covered face may have a node with a level set < 0.

    ## Additional background

    This is usually not an issue. However, in WarpX, we use the level set to
    decide whether a node is an unknown in the linear system. The
    inconsistency makes the solver fail in some cases.

commit 9c2264b
Author: Axel Huebl <[email protected]>
Date:   Fri Oct 14 07:41:06 2022 -0700

    `MFIter::Finalize`: Free `m_fa` (AMReX-Codes#2988)

    This `free` should potentially not be delayed until the destructor is
    called.

    Follow-up to AMReX-Codes#2985 AMReX-Codes#2983

commit f84c7a8
Author: Weiqun Zhang <[email protected]>
Date:   Wed Oct 12 10:44:11 2022 -0700

    Fix MLMG::getGradSolution & getFluxes for inhomogeneous Neumann and Robin BC (AMReX-Codes#2984)

    Because of the way how inhomogeneous and Robin BC are handled, we must
    add the inhomogeneous fluxes back, otherwise they would be zero at those
    boundaries.

commit ed1ecd6
Author: Axel Huebl <[email protected]>
Date:   Wed Oct 12 08:46:34 2022 -0700

    MFIter: Make Finalize Public (AMReX-Codes#2985)

    Follow-up to AMReX-Codes#2983

commit 5acfe07
Author: Axel Huebl <[email protected]>
Date:   Tue Oct 11 14:51:48 2022 -0700

    MFIter::Finalize (AMReX-Codes#2983)

    Add a Finalize function to MFIter.

    The idea about this is, that we can call this already before destruction
    in Python, where `for` loops do not create scope.

    This function must be robust enough to be called again in the
    constructor (or we need to add an extra bool to guard that it is not
    called again in the destructor).

    Co-authored-by: Weiqun Zhang <[email protected]>

commit 53e34d1
Author: Andy Nonaka <[email protected]>
Date:   Tue Oct 11 12:00:34 2022 -0700

    fix docs; Robin BC's for MLMG (AMReX-Codes#2982)

    Update the MLMG Robin BC description in the docs.

commit 0019b3a
Author: Weiqun Zhang <[email protected]>
Date:   Tue Oct 11 11:00:13 2022 -0700

    MLLinOp::postSolve (AMReX-Codes#2981)

    Add a virtual function MLLinOp::postSolve. This allows WarpX to set EB
    covered nodes to prescribed values in the solver's output for
    visualization purpose.

commit 2d87a4c
Author: Brandon Runnels <[email protected]>
Date:   Mon Oct 10 09:49:29 2022 -0600

    add templating for the cell bilinear interpolators (AMReX-Codes#2979)

    This templates the `mf_cell_bilin_interp` functions so that the
    interpolators can be used with `BaseFab`s of arbitrary type.

commit e4ab048
Author: Weiqun Zhang <[email protected]>
Date:   Wed Oct 5 12:03:41 2022 -0700

    FillPatcher class (AMReX-Codes#2972)

    This adds a class FillPatcher for filling fine level data. It's not as
    general as the various FillPatch functions (e.g., FillPatchTwoLevels).
    However, it can reduce the amount of communication data. Suppose we use
    RK2 with subcycling and the refinement ratio is 2. For each step on
    level 0, there are two steps on level 1. With RK2, each fine step needs
    to call FillPatch twice. So the total number of FillPatch calls is 4 in
    the two fine steps. Using the free function, one ParallelCopy per
    FillPatch call is needed for copying coarse data for spatial
    interpolation. With the FillPatcher class, two ParallelCopy calls will
    be done to copy old and new coarse data. Then these data will be used in
    the four FillPatcher::fill calls. This new approach saves two
    ParallelCopy calls per coarse step for a two levels run. It could save
    more if the time stepping requires more substeps or the refinement ratio
    is higher. Note that many of our AMReX codes use a time stepping
    algorithm that needs only one FillPatch call per step. For those codes,
    this new approach will not save any communication for a refinement ratio
    of 2. However, it will save communication when the refinement ratio is
    4.

commit 1bc4e4e
Author: Weiqun Zhang <[email protected]>
Date:   Mon Oct 3 16:50:45 2022 -0700

    Remove sycl namespace alias (AMReX-Codes#2971)

    This causes a conflict with new compilers.

commit de7b7f4
Author: Weiqun Zhang <[email protected]>
Date:   Mon Oct 3 14:06:58 2022 -0700

    Fix Tensor Solver BC (AMReX-Codes#2930)

    This fixes some bugs in the physical domain BC of tensor linear solver.

    At the corner of two no-slip walls (e.g., (0,0)), we have u(-1,0) =
    -u(0,0)
    and u(0,-1) = -u(0,0). It's incorrect to fill the corner ghost cell with
    u(-1,-1) = u(-1,0) + u(0,-1) - u(0,0), because it will result in
    u(-1,-1) =
    -3 * u(0,0).

    In the old approach, to avoid branches in computing transverse
    derivatives
    on cell faces, we fill the ghost cells first. For example, to compute
    du/dy
    at the lo-x boundary, we use the data in i = -1 and 0, just like we
    compute
    du/dy(i) using u(i-1) and u(i) for interior faces.  The problem is the
    normal velocity in the ghost cells outside a wall is filled with
    extrapolation of the Dirichlet value (which is zero) and more than 1
    interior cells. Because of the high-order extrapolation, u(-1) != -u(0).
    This is the desired approach for computing du/dx on the wall. However,
    this
    produces incorrect results in dudy.

    In the new approach, we explicitly handle the boundaries in the
    derivative
    stencil. For example, to compute transverse derivatives on an inflow
    face,
    we use the boundary values directly.

    Co-authored-by: cgilet <[email protected]>

commit 13aa4df
Author: Weiqun Zhang <[email protected]>
Date:   Fri Sep 30 17:48:22 2022 -0700

    Disable host device for macros for SYCL/DPC++ (AMReX-Codes#2969)

    The host part of the AMREX_HOST_DEVICE_FOR_* macros is disabled for
    SYCL/DPC++. It's really slow for compilation.

commit 62379fb
Author: Weiqun Zhang <[email protected]>
Date:   Fri Sep 30 15:37:35 2022 -0700

    Update CHANGES for 22.10 (AMReX-Codes#2968)

commit d65e09e
Author: Roberto Porcu <[email protected]>
Date:   Thu Sep 29 15:46:19 2022 -0400

    Solve an issue with particles async IO when having runtime added variables (AMReX-Codes#2966)

commit cd07b0d
Author: Weiqun Zhang <[email protected]>
Date:   Wed Sep 28 09:20:42 2022 -0700

    Fix int overflow in amrex::bisect (AMReX-Codes#2964)

    Change from (lo+hi)/2 to lo+(hi-lo)/2.  Although it's very unlikely, it's
    possible (lo+hi), where both lo and hi are integers, could overflow.

commit e55d6b4
Author: Junghyeon Park <[email protected]>
Date:   Thu Sep 29 01:20:15 2022 +0900

    Update the SWFFT project site (AMReX-Codes#2965)

commit b84d7c0
Author: Weiqun Zhang <[email protected]>
Date:   Mon Sep 26 16:05:10 2022 -0700

    Fix MLEBNodeFDLaplacian bottom solver (AMReX-Codes#2963)

    MLEBNodeFDLaplacian is never singular because it has Dirichlet boundary on
    the EB surface.  We did set the singular flag to false, but forgot about the
    bottom solver used a different function to query.  This fixes it by
    overriding the isBottomSingular function.

commit 5e84f43
Author: asalmgren <[email protected]>
Date:   Sun Sep 25 09:38:51 2022 -0700

    make tagging routines EB_aware (AMReX-Codes#2962)

commit 8b367b0
Author: Weiqun Zhang <[email protected]>
Date:   Sun Sep 25 09:22:13 2022 -0700

    Volume weighted sum (AMReX-Codes#2961)

    Add a new function doing volume weighted sum across AMR levels.  This may
    not be exactly what amrex application codes want.  But it should work for
    many cases.

commit 2a3cc05
Author: Weiqun Zhang <[email protected]>
Date:   Fri Sep 23 12:24:05 2022 -0700

    CellData: data in a single cell (AMReX-Codes#2959)

    This adds struct CellData that allows for accessing data in a single cell in
    Array4.  This is convenient sometimes because one can omit the i, j and k
    indices.  It might also be faster sometimes because it can skip the repeated
    index calculation involving i,j,k.

commit 27ef106
Author: Weiqun Zhang <[email protected]>
Date:   Fri Sep 23 12:23:34 2022 -0700

    Quartic interpolation for cell centered data (AMReX-Codes#2960)

    New Interpolator for interpolation of cell centered data using a
    fourth-degreee polynomial.  Note that the interpolation is not conservative
    and does not do any slope limiting.

commit c4b7982
Author: Luca Fedeli <[email protected]>
Date:   Fri Sep 23 21:17:12 2022 +0200

    Add GPU-compatible upper bound and lower bound algorithms to AMReX_Algorithm (AMReX-Codes#2958)

commit 3e5cc77
Author: Don E. Willcox <[email protected]>
Date:   Tue Sep 20 17:59:48 2022 -0700

    add option for makebuildsources to specify the style arguments for 'git describe'. (AMReX-Codes#2957)

commit a6e0c11
Author: Weiqun Zhang <[email protected]>
Date:   Tue Sep 20 10:01:21 2022 -0700

    Add more warnings (AMReX-Codes#2956)

    * Add -Wnon-virtual-dtor -Wlogical-op -Wmisleading-indentation
      -Wduplicated-cond -Wduplicated-branches to gcc.

    * Add -Wnon-virtual-dtor to clang.

    * Add more warnings to CI.

    * Fix some non-virtual dtors and some other warnings.

commit 826cd37
Author: Phil Miller <[email protected]>
Date:   Thu Sep 15 17:26:00 2022 -0700

    Add roundoff_lo corresponding to roundoff_hi for domains that don't start at 0 (AMReX-Codes#2950)

    * Lay groundwork for roundoff_lo

    * Add dummy implementation of roundoff_lo computation

    * implement bisect_prob_lo

    * change idx -> dxinv

    * use rlo instead of plo in locateParticle

    Co-authored-by: atmyers <[email protected]>

commit 6a5a056
Author: Weiqun Zhang <[email protected]>
Date:   Thu Sep 15 13:23:40 2022 -0700

    Add template parameter to ParallelFor and launch specifying block size (AMReX-Codes#2947)

    By default, amrex::ParallelFor launches AMREX_GPU_MAX_THREADS threads per
    block. We can now explicitly specfiy the block size with
    `ParallelFor<BLOCK_SIZE>(...)`, where BLOCK_SIZE should be a multiple of the
    warp size (e.g., 64, 128, etc.).  A similar change has also been made to
    `launch`.

    The changes are backward compatible.

commit 2cdb9df
Author: Andrew Myers <[email protected]>
Date:   Thu Sep 15 10:55:41 2022 -0700

    Byte spread fixes (AMReX-Codes#2949)

commit 17c94cc
Author: Candace Gilet <[email protected]>
Date:   Wed Sep 14 11:49:35 2022 -0400

    Correct MultiFab::norm0 doxygen brief description (AMReX-Codes#2946)

commit 0351c99
Author: Axel Huebl <[email protected]>
Date:   Wed Sep 14 08:48:25 2022 -0700

    CMake: HIP_PATH from ROCM_PATH (AMReX-Codes#2948)

    * On machines like Crusher, `ROCM_PATH` is more likely to be available
    then a `HIP_PATH` environment variable.

    This is mainly needed for our hacky ROCTX hints.

    * ROCTX: New Include

    Supposedly, there is a new include we shall use:

    Ref.:
    ROCm/roctracer#79

    * ROCtracer: Include as System library

    Because of GNU extensions in the roctracer include files for the legacy include.
    But we should make this `-isystem` anyway to be robust for the future.

    The 5.2 deprecated include file `<roctracer_ext.h>` throws warnings
    because they rely on GNU extensions:
    ```
    In file included from /opt/rocm/hip/../roctracer/include/ext/prof_protocol.h:27:
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:70:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:70:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:75:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:82:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:86:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:90:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:82:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:86:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:90:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
          struct {
          ^
    ```

    * GNUmake: Update Includes in `hip.mak`

    Use public prefix.

commit 9aa23c2
Author: Cody Balos <[email protected]>
Date:   Mon Sep 12 11:49:37 2022 -0700

    Fix minor typo in fcompare docs (AMReX-Codes#2945)

commit bfbd68f
Author: Axel Huebl <[email protected]>
Date:   Mon Sep 12 11:40:55 2022 -0700

    Fix: Make Finalize->Initialize->F->I->... Work (AMReX-Codes#2944)

    Fix assertions in Arena::Initialize.  The_BArena never dies (tm)

    Co-authored-by: Weiqun Zhang <[email protected]>

commit 6738470
Author: Weiqun Zhang <[email protected]>
Date:   Wed Sep 7 14:12:34 2022 -0700

    Changes for Cray & Clang (AMReX-Codes#2941)

    * It seems that the new Cray compilers no longer define `_CRAYC`.  However it does define
      `__cray__`.

    * For Clang based Cray compilers, use -O3 instead of -O2 for optimization.

    * Clang's vectorization pragma is very aggressive.  For some codes, it makes ParallelFor
      with many if statements on CPU much slower than without vectorization.  Unfortunately,
      it does not have an ivdep pragma.  So we disable AMREX_PRAGMA for clang for safety.

    * No longer need to use -Wno-pass-failed for Clang based compilers.

commit 5b0c598
Author: Weiqun Zhang <[email protected]>
Date:   Wed Sep 7 09:42:57 2022 -0700

    Fix a warning in packing communication send buffer (AMReX-Codes#2940)

    When we communication double precision data in single precision, there is a
    conversion from double to float in packing the send buffer.  A static cast
    is added to fix the warning.

commit 3e397bb
Author: Weiqun Zhang <[email protected]>
Date:   Wed Sep 7 09:13:53 2022 -0700

    Link to cublas when using CUDA and Hypre (AMReX-Codes#2933)

commit 9525ea8
Author: Weiqun Zhang <[email protected]>
Date:   Wed Sep 7 09:13:20 2022 -0700

    HIP: use coarse grained host memory (AMReX-Codes#2932)

commit 7e04016
Author: Marco Garten <[email protected]>
Date:   Wed Sep 7 08:53:20 2022 -0700

    Update Testing Docs (AMReX-Codes#2937)

    - document `abort_on_unused_inputs`
    - remove duplicate superfluous argument in regtest call

commit 539427a
Author: drangara <[email protected]>
Date:   Tue Sep 6 18:13:42 2022 -0400

    EB checkpoint files (AMReX-Codes#2897)

    * support for loading EB from checkpoint file

    * add support for writing chkpt file as well

    Co-authored-by: Weiqun Zhang <[email protected]>

commit 35ed6b4
Author: Axel Huebl <[email protected]>
Date:   Tue Sep 6 15:07:16 2022 -0700

    Fix: Loading Files Again (AMReX-Codes#2936)

    This enables that `amrex::ParmParse::addfile` can be called
    multiple times. Before this, we accidentially overwrite the
    `FILE` static keyword.

    Follow-up to AMReX-Codes#2842

commit 8f8198c
Author: hengjiew <[email protected]>
Date:   Tue Sep 6 13:36:35 2022 -0400

    Check if boundary particles container has been created before clearance. (AMReX-Codes#2935)

    This fixes a segmentation fault when using more GPUs for updating particles
    than fluid.

commit fb0b31e
Author: Nuno Miguel Nobre <[email protected]>
Date:   Sun Sep 4 05:18:49 2022 +0100

    SYCL: Replace deprecated atomic types and operations (AMReX-Codes#2921)

    * SYCL: Replace deprecated atomic types and operations

    * Change atomic refs to device memory scope

    When using the relaxed memory order, the memory scope is ignored.
    Thus, for cosmetic reasons only, we set the memory scope to device, the broadest option when using the global address space.

    Co-authored-by: Weiqun Zhang <[email protected]>

commit cc3cd14
Author: Weiqun Zhang <[email protected]>
Date:   Thu Sep 1 07:39:25 2022 -0700

    Update CHANGES for 22.09 (AMReX-Codes#2934)

commit acc223f
Author: Weiqun Zhang <[email protected]>
Date:   Tue Aug 30 16:04:43 2022 -0700

    Add hypre as an option for OpenBCSolver (AMReX-Codes#2931)

commit 3d29fd7
Author: hengjiew <[email protected]>
Date:   Wed Aug 24 16:10:22 2022 -0400

    Preserve neighbor particles when sorting particles. (AMReX-Codes#2923)

commit 8294c3a
Author: Weiqun Zhang <[email protected]>
Date:   Mon Aug 22 10:46:05 2022 -0700

    Scope of NonLocalBC::ParallelCopy (AMReX-Codes#2922)

    Make NonLocalBC::ParallelCopy accessible in namespace amrex, because it can
    be useful in situations other than non-local BC.

commit 0911fc4
Author: Weiqun Zhang <[email protected]>
Date:   Sun Aug 21 18:13:07 2022 -0700

    Open Boundary Poisson Solver (AMReX-Codes#2912)

    This adds an open boundary Poisson solver based on the James's algorithm.
    To use it, the user builds an amrex:OpenBCSolver object, which can be reused
    until the grids change, and then call OpenBCSolver::solver.

    Currently, this is for 3D cell-centered data only. The solver works on CPU,
    Nvidia GPUS, and AMD GPUs.  The SYCL version of a couple of kernels for
    Intel GPUs are to be implemented.

commit f270b3d
Author: Marc T. Henry de Frahan <[email protected]>
Date:   Thu Aug 18 13:51:56 2022 -0600

    Fix OOB access of ref ratio on HDF write header (AMReX-Codes#2919)

commit fa8e20f
Author: Jean M. Sexton <[email protected]>
Date:   Thu Aug 18 08:57:51 2022 -0700

    Add Polaris to GNUMake (AMReX-Codes#2908)

commit bd5f6a9
Author: Axel Huebl <[email protected]>
Date:   Mon Aug 15 14:24:21 2022 -0700

    Export GpuDevice Globals (AMReX-Codes#2918)

    * Export GpuDevice Globals

    Implement symbol export via `AMREX_EXPORT` for the global variables
    in `Src/Base/AMReX_GpuDevice.H`.

    Follow-up to AMReX-Codes#1847 AMReX-Codes#1847

    Fix AMReX-Codes#2917

    * Fix: Export `AMReX::m_instance`

commit 4f63929
Author: asalmgren <[email protected]>
Date:   Sat Aug 13 09:00:02 2022 -0700

    enable LinOp to use the right Factory (fixes moving geometry problem) (AMReX-Codes#2916)

commit 6593518
Author: Andrew Myers <[email protected]>
Date:   Thu Aug 11 15:24:16 2022 -0700

    Use 1 atomic instead of two per item in DenseBins::build (AMReX-Codes#2911)

commit d295f22
Author: Nuno Miguel Nobre <[email protected]>
Date:   Thu Aug 11 03:40:09 2022 +0100

    [SYCL] Remove amrex::oneapi and update deprecated device descriptors (AMReX-Codes#2910)

    * Remove amrex::oneapi in favour of standard features

    * Change deprecated device descriptors

commit 1bda173
Author: Axel Huebl <[email protected]>
Date:   Wed Aug 10 15:46:43 2022 -0600

    Add: `MultiFab::sum_unique` (AMReX-Codes#2909)

    This provides a new method to sum values in a `MultiFab`.
    For non-cell-centered data, `MultiFab::sum` double counts box
    boundary values that are owned by multiple boxes. This provides
    a function that does not double count these and provides a
    quick way to get only the sum of physically unique values.

    Co-authored-by: Weiqun Zhang <[email protected]>

commit 3f715d2
Author: Candace Gilet <[email protected]>
Date:   Mon Aug 8 14:40:28 2022 -0400

    In MLMG::mgFcycle, assert that for EB the linop is cell-centered. (AMReX-Codes#2905)

commit 59b0742
Author: hengjiew <[email protected]>
Date:   Mon Aug 8 14:17:57 2022 -0400

    Clear the boundary particle indices' container before updating it. (AMReX-Codes#2907)

    This avoids potential segmentation faults when one grid's particles all
    move to other grids.

commit 103db6e
Author: Weiqun Zhang <[email protected]>
Date:   Fri Aug 5 15:25:33 2022 -0700

    EB: Add Fine Levels (AMReX-Codes#2881)

    Add a new function EB2::addFineLevels() that can be used to add more fine
    levels to the existing EB IndexSpace without changing the coarse levels.
    This is useful for restarting with a larger amr.max_level.  The issue is we
    build EB at the finest level first and then coarsen it to the coarse levels.
    If the restart run has a different finest level, the EB on the coarse levels
    could be different without using this new capability.

commit 6ebf8ff
Author: Jon Rood <[email protected]>
Date:   Thu Aug 4 14:32:59 2022 -0600

    Add rpath to lib64 for ZFP. (AMReX-Codes#2902)

commit ed23627
Author: Yadong_Zeng <[email protected]>
Date:   Thu Aug 4 16:32:21 2022 -0400

    change data types from double to amrex::Real, and thus we can use single precision for the hypre IJ interface (AMReX-Codes#2896)

    Co-authored-by: yzeng <[email protected]>

commit 9ed4f59
Author: Weiqun Zhang <[email protected]>
Date:   Wed Aug 3 16:53:20 2022 -0700

    Fix a new bug introduced in AMReX-Codes#2858 (AMReX-Codes#2901)

    We need to take into account that `amrex::Any` stores `MultiFab&` or `MultiFab const&`.

commit 6eaab8c
Author: Weiqun Zhang <[email protected]>
Date:   Wed Aug 3 13:39:44 2022 -0700

    MPMD Support (AMReX-Codes#2895)

    Add support for multiple programs multiple data (MPMD).  For now, we assume
    there are only two programs (i.e., executables) in the MPMD mode.  During
    the initialization, MPI_COMM_WORLD is split into two communicators.  The
    MPMD::Copier class can be used to copy FabArray/MultiFab data between two
    programs.  This new capability can be used by FHDeX to couple FHD with
    SPPARKS.

commit 9469329
Author: Weiqun Zhang <[email protected]>
Date:   Mon Aug 1 09:43:21 2022 -0700

    MLMG interface (AMReX-Codes#2858)

    These changes are made to support a generic type (i.e., amrex::Any) in MLMG.
    This is still work in progress.  But it should not break any existing codes.

commit 5a3b303
Author: Weiqun Zhang <[email protected]>
Date:   Mon Aug 1 09:34:44 2022 -0700

    Update CHANGES for 22.08 (AMReX-Codes#2894)

commit 48702b4
Author: hengjiew <[email protected]>
Date:   Thu Jul 28 14:14:19 2022 -0400

    Let `selectActualNeighbors` return right after starting if there are (AMReX-Codes#2886)

    no particles for communication.

commit 6a47d89
Author: kngott <[email protected]>
Date:   Wed Jul 27 17:03:04 2022 -0700

    Add Comm Sync to Redistribute (AMReX-Codes#2891)

commit 51542c8
Author: philip-blakely <[email protected]>
Date:   Wed Jul 27 17:29:26 2022 +0100

    Multi-materials and derived variable output (AMReX-Codes#2888)

    ## Summary

    Output small plots if only derived variables are specified.
    Also, make DeriveFuncFab a std::function<> instead of plain function-pointer.

    ## Additional background

    We have been implementing small-plots for outputing variables at gauges (e.g. pressure at specific gauge locations). We may want to output the derived variable pressure only, and not all state-variables. The if-condition was incorrect in this case.

    Further, multi-material simulations require a material index in order to compute derived variables, in addition to existing parameters. Making DeriveFuncFab a std::function is sufficient for our purposes.

commit ce0fb74
Author: Andrew Myers <[email protected]>
Date:   Tue Jul 26 16:20:38 2022 -0700

    Fix host / device sync bug in PODVector (AMReX-Codes#2890)

commit 06753e6
Author: Axel Huebl <[email protected]>
Date:   Tue Jul 26 12:54:35 2022 -0700

    `TagBoxArray::collate`: Fujitsu Clang (AMReX-Codes#2889)

    `mpiFCC -Nclang` only defines `__CLANG_FUJITSU`, not `__FUJITSU` as
    in the classic compiler mode.

commit 7cf77dc
Author: Weiqun Zhang <[email protected]>
Date:   Tue Jul 26 11:01:21 2022 -0700

    MinLoc and MaxLoc Support (AMReX-Codes#2885)

    Add struct ValLocPair that can be used by ReduceOps/ReduceData and ParReduce
    to find the location of the min/max value.

    Add warp shuffle down function for more general types.  This is needed for
    MinLoc/MaxLoc with CUDA < 11, because we don't use CUB for earlier versions
    of CUDA.

    The Intel GPU support is not done yet.  We need to allocate enough shared
    local memory when the size of ValLocPair is larger than the size of unsigned
    long long.

commit 4b7e200
Author: Weiqun Zhang <[email protected]>
Date:   Thu Jul 21 10:25:57 2022 -0700

    HIP: Remove the call to hipDeviceSetSharedMemConfig (AMReX-Codes#2884)

    AMD devices do not support shared cache banking.

    Thanks @afanfa for reporting this. (AMReX-Codes#2883)

commit 8e40952
Author: Weiqun Zhang <[email protected]>
Date:   Wed Jul 20 12:10:26 2022 -0700

    Add Frontier to GNU Make (AMReX-Codes#2879)

commit b673d81
Author: Max Katz <[email protected]>
Date:   Mon Jul 18 15:14:19 2022 -0400

    Add option to derefine to AMRErrorTag (AMReX-Codes#2875)

    This allows a refinement field to specify *derefinement* (by setting a zone's tagging value to the clear value).

commit 73dbf2f
Author: hengjiew <[email protected]>
Date:   Mon Jul 18 12:53:35 2022 -0400

    Fix the segmentation fault in selecting actual neighbor particles. (AMReX-Codes#2877)

commit 40b3d21
Author: Weiqun Zhang <[email protected]>
Date:   Wed Jul 13 13:24:15 2022 -0700

    Add extra braces in initialization of GpuArray (AMReX-Codes#2876)

    It should not be needed since C++14.  But some compilers seem to need the
    double braces.

commit a633d2b
Author: Luca Fedeli <[email protected]>
Date:   Fri Jul 8 20:34:18 2022 +0200

    Workaround to bypass issue observed at very large scale with Fujitsu MPI (AMReX-Codes#2874)

    We have observed some MPI issues at very large scale when WarpX is compiled using Fujitsu MPI (i.e., with the Fujitsu compiler). These issues seem to be related to the use of MPI Gatherv with MPI_Datatype. This PR implements a possible workaround, initially proposed by @WeiqunZhang . The idea is that, when WarpX is compiled with the Fujitsu compiler, simpler integer arrays instead of MPI_Datatype are used in the routine where the issue was observed.

commit 7660c88
Author: Weiqun Zhang <[email protected]>
Date:   Fri Jul 8 08:48:14 2022 -0700

    Allow zero components MultiFab and BaseFab (AMReX-Codes#2873)

    This is useful for particle I/O that does not have any mesh data.  yt needs
    a header file associated with a MultiFab.

commit c849dd1
Author: Weiqun Zhang <[email protected]>
Date:   Fri Jul 8 08:06:37 2022 -0700

    New EB optimization parameter: eb2.num_coarsen_opt (AMReX-Codes#2872)

    At the beginning of EB generation, we chop the entire finest domain into
    boxes and find out the type of the boxes.  We then collect the completely
    covered boxes and cut boxes into two BoxArrays.  This process can be costly
    because of the number of calls to the implicit functions.  In this commit,
    we have introduced a new ParmParse parameter, eb2.num_coarsen_opt with a
    default value of zero.  If for instance it is set to 3, we start the box
    type categorization at a resolution that is coarsened by a factor of 2^3.
    For the provisional cut boxes, we refine them by a factor of 2, Then we chop
    them into small boxes and categorize the new boxes.  This process is
    performed recursively until we are at the original finest resolution.

    The users should be aware that, if eb2.num_coaren_opt is too big, this could
    produce in erroneous results because evaluating the implicit function on
    coarse boxes could miss fine structures in the EB.

    Thank Robert Marskar for sharing this algorithm.

commit 557aae8
Author: Erik <[email protected]>
Date:   Wed Jul 6 08:54:24 2022 -0700

    point to new location of AMReX images, AMReX website repo (AMReX-Codes#2867)

commit cbdc658
Author: Axel Huebl <[email protected]>
Date:   Tue Jul 5 01:41:03 2022 +0200

    SENSEI 4.0: Fix Build for Particles (AMReX-Codes#2869)

    ## Summary

    This part causes a compile error now in WarpX.

    cc  @burlen @kwryankrattiger

    ## Additional background

    X-ref: Blocks WarpX 22.07 release ECP-WarpX/WarpX#3211

    Follow-up to:
    - AMReX-Codes#2785
    - AMReX-Codes#2834

commit dc8b734
Author: Andrew Myers <[email protected]>
Date:   Fri Jul 1 17:19:20 2022 -0700

    Cache the neighbor comm tags for the CPU implementation of fillNeighbors. (AMReX-Codes#2862)

    * Cache the neighbor comm tags for the CPU implementation of fillNeighbors.

    * fix areMasksValid function

commit 2b42fb5
Author: drangara <[email protected]>
Date:   Fri Jul 1 18:44:35 2022 -0400

    Remove some hard checks in check_mvmc for 3D (AMReX-Codes#2864)

    Removing some hard checks in 3D coarsening logic as it appears that those are not necessarily bad states, and a soft failure to coarsen should suffice.

commit 19c7068
Author: Erik <[email protected]>
Date:   Fri Jul 1 18:24:24 2022 -0400

    Carry over fix for ngbxy.smallEnd typo (AMReX-Codes#2868)

    This a typo that got correct in other places but didn't get fixed here.

commit d736ef2
Author: Weiqun Zhang <[email protected]>
Date:   Fri Jul 1 11:00:15 2022 -0700

    Update CHANGES for 22.07 (AMReX-Codes#2866)

commit be813d0
Author: Weiqun Zhang <[email protected]>
Date:   Fri Jul 1 10:29:13 2022 -0700

    Hypre: add version check (AMReX-Codes#2865)

    These HYPRE_SetSp* are only available in hypre >= 22500.

commit 8fb23ec
Author: Jon Rood <[email protected]>
Date:   Wed Jun 29 16:52:35 2022 -0600

    Refactor Make.nrel to use MPT for MPI with the Intel compiler on Eagle. (AMReX-Codes#2861)

commit 6f9a46c
Author: PaulMullowney <[email protected]>
Date:   Wed Jun 29 11:09:57 2022 -0600

    Adding control APIs and namespacing for core algorithm paths like SpGEMM, SpMV, and SpTrans. (AMReX-Codes#2859)

    Co-authored-by: Paul Mullowney <[email protected]>

commit e4c83cf
Author: Jon Rood <[email protected]>
Date:   Wed Jun 29 11:08:42 2022 -0600

    Add lib64 library location for ZFP since it may exist there instead of lib. (AMReX-Codes#2860)

commit b2b9150
Author: Burlen Loring <[email protected]>
Date:   Tue Jun 28 13:42:41 2022 -0700

    update the SENSEI in situ coupling for SENSEI v4.0.0 (AMReX-Codes#2785)

    In this release, an install of VTK is no longer required.
    To compile AMReX w/ SENSEI use:

    ```cmake
    -DAMReX_SENSEI=ON -DSENSEI_DIR=<path to SENSEI install>/<lib dir>/cmake
    ```

    Note: <lib dir> may be `lib` or `lib64` or something else depending on
    your OS and is determined by CMake at configure time. See the CMake
    GNUInstallDirs documentation for more information.

commit 2c5f475
Author: Andrew Myers <[email protected]>
Date:   Tue Jun 28 12:51:19 2022 -0700

    Write runtime attribs to checkpoints on GPUs (AMReX-Codes#2856)

commit d2cb546
Author: Jon Rood <[email protected]>
Date:   Tue Jun 28 13:27:02 2022 -0600

    Fix gnu make on Crusher for mpi_gtl_hsa (AMReX-Codes#2857)

    Update environment variable at OLCF for mpi_gtl_hsa.

commit 21fe4b3
Author: Axel Huebl <[email protected]>
Date:   Tue Jun 28 19:53:09 2022 +0200

    CMake: FindDependency CUDAToolkit (AMReX-Codes#2849)

    If we install AMReX with CUDA support using a modern
    CMake, we need to repopulate targets such as `CUDA::curand`
    from `find_dependency` for downstream.
    Downstream users find us via `find_package` and that target
    link dependency showed up to be unpopulated in MFIX.

commit 027f2ff
Author: Weiqun Zhang <[email protected]>
Date:   Thu Jun 23 16:15:57 2022 -0700

    Fix make help (AMReX-Codes#2854)

    This reverts the change in AMReX-Codes#2845, which fixed an issue with `make print-%`, but broke
    `make help`.  This is now fixed in a different way.  Both `make print-%` and `make help`
    should work now.

commit 3d3ad21
Author: kngott <[email protected]>
Date:   Thu Jun 23 13:39:59 2022 -0700

    NERSC Programming Environment prototype (AMReX-Codes#2848)

commit 4872676
Author: Weiqun Zhang <[email protected]>
Date:   Thu Jun 23 12:41:20 2022 -0700

    GNU Make: No need to query mpif90 if Fortran is not used. (AMReX-Codes#2852)

    This minimize potential issues.

commit fc0d646
Author: Weiqun Zhang <[email protected]>
Date:   Thu Jun 23 12:23:55 2022 -0700

    Remove f90doc (AMReX-Codes#2851)

    We no longer use it.

commit 5188a6a
Author: Weiqun Zhang <[email protected]>
Date:   Thu Jun 23 11:09:15 2022 -0700

    Explicitly invoke python3 (AMReX-Codes#2850)

    According to PEP 394, a python distributor may choose to not provide the
    python command.  In fact, that's what recent versions of macOS do.

commit 2d931f6
Author: Andrew Myers <[email protected]>
Date:   Wed Jun 22 15:03:50 2022 -0500

    Maintain the high end of the 'roundoff domain' in both float and double precision (AMReX-Codes#2839)

    * Maintain the high end of the 'roundoff domain' in both float and double precision

    * fix shadowing

    * fix warning

    * fix float conversion warning

    * fix logic

    * Update Src/Base/AMReX_Geometry.H

    * Update Src/Base/AMReX_Geometry.H
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants