Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage data while appending local variables to a BP4 file several times #2987

Closed
sergeylesnik opened this issue Jan 3, 2022 · 12 comments
Closed

Comments

@sergeylesnik
Copy link

The BP4 engine is set to write in Append mode. The variables are local and in the particular case each variable belongs to a single rank. As a result the data written for steps >0 and ranks > 0 end up to be some random numbers. Interestingly enough the min/max (inspected with bpls -l) values are correct.

To Reproduce

  1. Compile an example derived from localArray_write
#include "adios2/common/ADIOSTypes.h"
#include <iostream>
#include <string>
#include <vector>

#include <adios2.h>

int main(int argc, char *argv[])
{
    int rank = 0;
#if ADIOS2_USE_MPI
    int nproc = 1;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nproc);
#endif
    const int NSTEPS = 2;

    // generate different random numbers on each process,
    // but always the same sequence at each run
    srand(rank * 32767);

#if ADIOS2_USE_MPI
    adios2::ADIOS adios(MPI_COMM_WORLD);
#else
    adios2::ADIOS adios;
#endif

    const unsigned int Nx = rand() % 6 + 5;
    std::vector<double> v1(Nx);

    // Get io settings from the config file or
    // create one with default settings here
    adios2::IO io = adios.DeclareIO("Output");
    io.SetEngine("BP4");
    io.SetParameters({{"verbose", "4"}});

    for (int step = 0; step < NSTEPS; step++)
    {
        adios2::Variable<double> varV1 =
            io.DefineVariable<double>
            (
                "v1_rank" + std::to_string(rank)
                    + "_step" + std::to_string(step),
                {},
                {},
                {Nx}
            );

        adios2::Engine writer =
            io.Open("localArray.bp", adios2::Mode::Append);

        // v1
        for (size_t i = 0; i < Nx; i++)
        {
            v1[i] = rank * 1.0 + step * 0.1;
        }
        writer.Put<double>(varV1, v1.data());

        writer.Close();
    }

#if ADIOS2_USE_MPI
    MPI_Finalize();
#endif

    return 0;
}
  1. Run it
    mpirun -np 2 localArray_write_mpi

  2. Inspect the data

> bpls -d localArray.bp
  double   v1_rank0_step0  [1]*{6}
        step 0: 
          block 0: [0:5]
    (0)    0 0 0 0 0 0

  double   v1_rank0_step1  [1]*{6}
        step 1: 
          block 0: [0:5]
    (0)    0.1 0.1 0.1 0.1 0.1 0.1

  double   v1_rank1_step0  [1]*{10}
        step 0: 
          block 0: [0:9]
    (0)    1 1 1 1 1 1
    (6)    1 1 1 1 

  double   v1_rank1_step1  [1]*{10}
        step 1: 
          block 0: [0:9]
    (0)    3.75559e-308 1.31916e-320 0 0 4.03262e-313 1.3241e-321
    (6)    2.07024e-317 1.93414e+141 1 1

> bpls -l localArray.bp
  double   v1_rank0_step0  [1]*{6} = 0 / 0
  double   v1_rank0_step1  [1]*{6} = 0.1 / 0.1
  double   v1_rank1_step0  [1]*{10} = 1 / 1
  double   v1_rank1_step1  [1]*{10} = 1.1 / 1.1

Expected behavior
v1_rank1_step1 variable is populated with 1.1

Desktop (please complete the following information):

  • OS/Platform: Ubuntu 20.04
  • Build: compiler version gcc 9.3.0, cmake version 3.16.3, ADIOS2: v2.7.1 and latest from January 3rd

When Open(...) and Close() are moved outside the for loop, everything is fine. I suppose that opening and closing an appending engine for a subset of variables to be written is not the default use case but I couldn't find any hint in the documentation that this is not supported.

@williamfgc
Copy link
Contributor

@sergeylesnik thanks for providing the code snippet.

I suppose that opening and closing an appending engine for a subset of variables to be written is not the default use case but I couldn't find any hint in the documentation that this is not supported.

Yes, Open and Close are collective operations and running them on a loop defeats the purpose of I/O performance. The preferred way is to use BeginStep and EndStep to separate transport operations in Open/Close.

What's surprising is that Append still works the first time a file is opened (assuming it doesn't exist), maybe it needs an exception. Also, Append mode in parallel I/O is tricky since the same number of streams must be conserved (local and aggregators) so it's not as straight-forward (as in serial I/O) to add to any dataset. It's not the case in your example, but it's something to consider.

@sergeylesnik
Copy link
Author

sergeylesnik commented Jan 5, 2022

@williamfgc tank you for the quick response.

Yes, Open and Close are collective operations and running them on a loop defeats the purpose of I/O performance.

The preferred way is to use BeginStep and EndStep to separate transport operations in Open/Close.
I follow from this that the expected use case is to Open and Close an Engine only once for .bp file written.

Append mode in parallel I/O is tricky since the same number of streams must be conserved

I can imagine that this is a difficult task. So, is this supported by ADIOS2?

Back to the original problem. I've done some testing and found something that looks definitely like a bug to me. Consider the following example:

  1. Introduce in localArray_write.cpp two changes:
    • io.SetEngine("BP3"); -> io.SetEngine("BP4");
    • io.Open("localArray.bp", adios2::Mode::Write); -> io.Open("localArray.bp", adios2::Mode::Append);
  2. Compile
  3. Run twice with mpirun -np 2 localArray_write_mpi

The values from the first run are OK but the second run delivers again garbage data for the 2nd rank. If I manually change the variable names (v1 -> vv1) in localArray_write.cpp after the 1st run and before the 2nd one, the behavior is the same. Now, if I add io.SetParameters({{"NumAggregators", "2"}}); directly below io.SetEngine("BP4"); everything is fine again. So, it looks like appending works only without aggregation.

@pnorbert
Copy link
Contributor

pnorbert commented Jan 5, 2022

Yes, this is a bug where aggregation + append causes corrupt data. Related to #2482

@williamfgc
Copy link
Contributor

williamfgc commented Jan 5, 2022

What @pnorbert said. Also, BP3 doesn't support "Append" mode (it should throw an exception, though), BP4 was added with the purpose of being a more streaming-friendly format, but as you can see there is always corner cases.

@sergeylesnik
Copy link
Author

Thanks for clarifying!

We are implementing ADIOS as a parallel IO into OpenFOAM (a computational fluid dynamics library) and one of the important features in this context is aggregation. This bug renders the "Append" mode useless for us. Are there any plans to fix it?

@pnorbert
Copy link
Contributor

pnorbert commented Jan 7, 2022

We will fix this

@pnorbert
Copy link
Contributor

pnorbert commented Jan 7, 2022

This should be fixed now in master

@sergeylesnik
Copy link
Author

Thank you, it works properly now. I close the issue.

@williamfgc
Copy link
Contributor

We are implementing ADIOS as a parallel IO into OpenFOAM (a computational fluid dynamics library) and one of the important features in this context is aggregation.

@sergeylesnik FYI, @olesenm has done work integrating adios with OpenFOAM

@olesenm
Copy link

olesenm commented Jan 11, 2022

@sergeylesnik - I'm not sure what exactly or how you are implementing ADIOS as parallel IO in OpenFOAM, but as @williamfgc mentioned there has already been some work done on this - Henrik R. at your company surely has this information too.

@sergeylesnik
Copy link
Author

@williamfgc Yes, thanks, I knew about it. But our concept is different.
@olesenm Thank you, I learned a lot from your implementation. Considering our ansatz it's far from being ready for production but if you are interested you may find it in the wp3 repo.

@olesenm
Copy link

olesenm commented Jan 11, 2022

but if you are interested you may find it in the wp3 repo.

OK that makes much more sense now - as long as it's going into the wp3 proof-of-concept should be alright.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants