Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable loses selection after engin close #3057

Closed
hr87 opened this issue Feb 17, 2022 · 33 comments
Closed

Variable loses selection after engin close #3057

hr87 opened this issue Feb 17, 2022 · 33 comments

Comments

@hr87
Copy link

hr87 commented Feb 17, 2022

I declare my variables, then when I want to save a step, I open the file, write a step, and close the file to avoid having open files and to ensure data is written to disk.

This works fine on a single rank. On multiple ranks, each rank writes a portion of the array, which is defined by the declare_variable routine. This works fine for the first write. However, after closing and reopening the engine/file, the variable lost is selection, writing values in random places.

Steps to reproduce:

  1. declare variable where each rank writes part of the array (no overlap)
  2. open engine
  3. start step
  4. get variable from inquire_variable by name
  5. write variables in a step (this works fine)
  6. finish step
  7. close engine
  8. open engine (same file with append)
  9. perform same steps as above 3-6
  10. close engine

The output of the second step is garbled. The variables forgot where to put their sections, placing it randomly in the array, leaving some places uninitialized, and the others out of order.

Additional info: I am using the fortran interface

@eisenhauer
Copy link
Member

Generally, opening and closing the file on every timestep is going to destroy your I/O performance...

@eisenhauer
Copy link
Member

eisenhauer commented Feb 17, 2022

Also, there is some ambiguity on "append" mode, WRT what happens with variables. The ADIOS API doesn't clearly differentiate between variables created by the application (via declare variable) and variables implicitly created upon read, or more specifically upon loading metadata (which may occur on open(), depending upon the engine). If I were guessing here, I'd say that your step 8 is replacing the variables that you declared in step 1, destroying the individual start and count geometries (which aren't recreated on read). So the variable that you're getting in step 4 the second time isn't the same one that you got the first time through.

This is a semantic ambiguity in ADIOS, and would be difficult to make it behave as you expect. However, you can easily set your write geometry anew between steps 4 and 5 to resolve the ambiguity.

(That said, there is a large amount of overhead involved in creating an engine and opening a file for append. It's hard to imagine a situation where closing the file between steps is actually a good idea.)

@williamfgc
Copy link
Contributor

Agree with @eisenhauer. @hr87 please try using adios2_begin_step/adios2_end_step the separation with open/close is due to performance.

@pnorbert
Copy link
Contributor

Interesting error, it might be related to the now fixed #2482, or a new bug. Are you experiencing this problem with the master branch or with 2.7.1?

BTW, there is no point doing close and reopen for your case. They do not ensure anything you want (data written to disk), and is causing you a lot of unnecessary overhead.

@hr87
Copy link
Author

hr87 commented Feb 17, 2022

I use tag v2.7.1.436 currently.

I tried resetting the selection, but that does not work, since I define the variables with constant shape/selection. So calling set_selection is denied by adios.

I am using begin_step and end_step. However, I definitely have problems that the output is in an inconsistent state. I am using it for restart. Reading in the last step, it errors out with that there is no data for step n. And looking at the file with bpls I can see that the arrays only have n-1 steps in them.

I am currently trying the flush methods, but these give me buffer overflows. So how can I ensure that everything is written and in a consistent state before continuing with the simulation?

@williamfgc
Copy link
Contributor

williamfgc commented Feb 17, 2022

@hr87 sharing code would help as there are many concepts touched in your use-case. There is no mention of begin_step/end_step in the original steps. My bad, steps 3 and 6 indicate that.

@hr87
Copy link
Author

hr87 commented Feb 17, 2022

program main

  use adios_writer

  call init_adios(filename, append)

  call declare_string_variable('name')

  do i = 1,num_timesteps
    if (save_timestep) then

      ! do calculation
      
      call begin_adios_step

      call write_string_variable('name', value)

      call finish_adios_step
    end if

  call finalize_adios

end program

where the module looks like this (I omitted different types of variables)

! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! module to write ADIOS2 files
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module adios_writer

    ! import ADIOS2 module
    use adios2, only: adios2_adios, &
                      adios2_io, &
                      adios2_engine

    ! variable declaration
    implicit none

    ! adios object
    type(adios2_adios), private :: adios
    ! I/O object
    type(adios2_io), private :: adios_io
    ! engine object
    type(adios2_engine), private :: adios_engine
    ! output file name
    character(len=200), private :: output_file

contains

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! setting up ADIOS
    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    subroutine init_adios(filename, append, debug, config_file)

        ! adios 2 functions
        use adios2, only: adios2_init, &
                          adios2_declare_io, &
                          adios2_open, &
                          adios2_mode_write, &
                          adios2_mode_append

#ifdef petsc
        ! mpi information
        use ifp_mpi, only: mpi_info
#endif
        ! helper functions
        use adios_helper, only: check_return_value

        ! variable declaration
        implicit none

        ! file name to open
        character(len=*), intent(in) :: filename
        ! flag to append to file
        logical, intent(in) :: append
        ! flag to use debug mode
        logical, intent(in) :: debug
        ! flag to use a config file
        character(len=*), intent(in) :: config_file

        ! open mode, append or create new
        integer :: write_mode
        ! adios return value
        integer :: adios_return

        ! store file name
        output_file = filename

#ifdef petsc
        ! check for config file
        if (config_file /= '') then
            ! init ADIOS 2 for MPI execution, with a config file
            call adios2_init(adios, config_file, mpi_info % communicator_x, debug, adios_return)
        else
            ! init ADIOS 2 for MPI execution, without config file
            call adios2_init(adios, mpi_info % communicator_x, debug, adios_return)
        end if ! config file
#else
        ! check for config file
        if (config_file /= '') then
            ! init ADIOS 2 for serial, with a config file
            call adios2_init(adios, config_file, debug, adios_return)
        else
            ! init adios for serial execution, without config file
            call adios2_init(adios, debug, adios_return)
        end if ! config file
#endif

        ! check for errors
        call check_return_value(adios_return, 'init_adios:init')

        ! define IO object
        call adios2_declare_io(adios_io, adios, 'lo_system', adios_return)
        ! check for errors
        call check_return_value(adios_return, 'init_adios:declare_io')

        ! check if we create a new file
        if (append) then
            ! set mode to write, which overwrites or creates a new file
            write_mode = adios2_mode_append
        else
            ! set mode to append
            write_mode = adios2_mode_write
        end if ! new file

        ! open file with engine
        call adios2_open(adios_engine, adios_io, output_file, write_mode, adios_return)
        ! check for error
        call check_return_value(adios_return, 'init_adios:open')

    end subroutine init_adios

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! shut down ADIOS
    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    subroutine finalize_adios

        ! adios 2 functions
        use adios2, only: adios2_flush_all, &
                          adios2_close, &
                          adios2_finalize

        ! helper functions
        use adios_helper, only: check_return_value

        ! variable declaration
        implicit none

        ! error return variable
        integer :: adios_return

        ! check if initialized
        if (adios % valid) then
            ! flush everything to disk
            call adios2_flush_all(adios, adios_return)
            ! check for errors
            call check_return_value(adios_return, 'finalize_adios:flush')

            ! close engine
            call adios2_close(adios_engine, adios_return)
            ! check for errors
            call check_return_value(adios_return, 'finalize_adios:close')

            ! finalize adios object
            call adios2_finalize(adios, adios_return)
            ! check for errors
            call check_return_value(adios_return, 'finalize_adios')

        end if ! adios valid

    end subroutine finalize_adios

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! begin a new time step
    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    subroutine begin_adios_step

        ! adios 2 routines
        use adios2, only: adios2_begin_step, &
                          adios2_step_mode_append

        ! helper functions
        use adios_helper, only: check_return_value

        ! variable declaration
        implicit none

        ! error return variable
        integer :: adios_return

        ! begin a new time step
        call adios2_begin_step(adios_engine, adios2_step_mode_append, adios_return)
        ! check for errors
        call check_return_value(adios_return, 'begin_adios_step:begin')

    end subroutine begin_adios_step

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! finish a new time step
    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    subroutine finish_adios_step

        ! adios 2 routines
        use adios2, only: adios2_end_step, &
                          adios2_flush_all_engines

        ! helper functions
        use adios_helper, only: check_return_value

        ! variable declaration
        implicit none

        ! error return variable
        integer :: adios_return

        ! flush everything to disk
        call force_write

        ! begin a new time step
        call adios2_end_step(adios_engine, adios_return)
        ! check for errors
        call check_return_value(adios_return, 'finish_adios_step:finish')

        ! flush to disk
        ! THIS CALL GIVES BUFFER OVERFLOW
        call adios2_flush_all_engines(adios_io, adios_return)
        ! check for errors
        call check_return_value(adios_return, 'finish_adios_step:flush')

    end subroutine finish_adios_step

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! force write of deferred puts
    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    subroutine force_write

        ! adios 2 routines
        use adios2, only: adios2_perform_puts

        ! helper functions
        use adios_helper, only: check_return_value

        ! variable declaration
        implicit none

        ! error return variable
        integer :: adios_return

        ! flush everything to disk
        call adios2_perform_puts(adios_engine, adios_return)
        ! check for errors
        call check_return_value(adios_return, 'force_write')

    end subroutine force_write

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! declare a globel string variable
    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    subroutine declare_string_variable(variable_name)

        ! adios library
        use adios2, only: adios2_define_variable, &
                          adios2_variable, &
                          adios2_type_string

        ! adios helper functions
        use adios_helper, only: check_return_value

        ! variable declaration
        implicit none

        ! variable name
        character(len=*), intent(in) :: variable_name

        ! adios variable handle
        type(adios2_variable) :: variable
        ! error return variable
        integer :: adios_return

        ! create variable
        call adios2_define_variable(variable, adios_io, variable_name, adios2_type_string, adios_return)
        ! check for errors
        call check_return_value(adios_return, 'declare_string_variable: '//variable_name)

    end subroutine declare_string_variable

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! write a single string value to file
    !
    ! We must be in a valid adios time step, the enging must be open, otherwise this will fail.
    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    subroutine write_string_variable(variable_name, string, force_write)

        ! adios methods
        use adios2, only: adios2_inquire_variable, &
                          adios2_put, &
                          adios2_variable, &
                          adios2_mode_deferred, &
                          adios2_mode_sync

        ! mpi info
        use ifp_mpi, only: mpi_info
        ! helper functions
        use adios_helper, only: check_return_value

        ! variable declaration
        implicit none

        ! variable name
        character(len=*), intent(in) :: variable_name
        ! value to write
        character(len=*), intent(in) :: string
        ! force write now, needed for temporary variables, default false
        logical, intent(in), optional :: force_write

        ! handle to adios variable
        type(adios2_variable) :: variable
        ! write mode
        integer :: write_mode
        ! error return variable
        integer :: adios_return

        ! set write mode to default deferred
        write_mode = adios2_mode_deferred

        ! check if flag is present
        if (present(force_write)) then
            ! check the flag
            if (force_write) then
                ! set write mode to sync
                write_mode = adios2_mode_sync
            end if ! force_write
        end if ! present force_write

        if (mpi_info % is_lo_main_rank) then
            ! get adios variable from io object
            call adios2_inquire_variable(variable, adios_io, variable_name, adios_return)
            ! check for errors
            call check_return_value(adios_return, 'write_string_variable:get_variable '//variable_name)

            ! submit values for writting
            call adios2_put(adios_engine, variable, string, write_mode, adios_return)
            ! check for errors
            call check_return_value(adios_return, 'write_string_variable:put '//variable_name)

        end if ! main LO rank

    end subroutine write_string_variable

end module adios_writer

The problem I currently have is the flush operation.

@pnorbert
Copy link
Contributor

Can you share the config file please?

@pnorbert
Copy link
Contributor

I am using begin_step and end_step. However, I definitely have problems that the output is in an inconsistent state. I am using it for restart. Reading in the last step, it errors out with that there is no data for step n. And looking at the file with bpls I can see that the arrays only have n-1 steps in them.

This should not be happening unless you set a buffer size for BP4 engine big enough to buffer 2+ steps. end_step should write out the data. With "inconsistent" do you mean you have trouble reading in data, the file is corrupt or the data is corrupt, or as you indicated that it only has n-1 (perfectly readable) steps and then there is no more there?

@hr87
Copy link
Author

hr87 commented Feb 17, 2022

Can you share the config file please?

I am not using a config file. I am using the default settings from adios.

With inconsistent I mean, that when I try reading it for a restart (selecting the last step) it will try to read step n. But step n is not in the file, just the number of steps seems to be already updated in the file. The arrays only have n-1 steps (looking at it with bpls).

@williamfgc
Copy link
Contributor

The problem I currently have is the flush operation.

Is there any difference if you try without the flush subroutines? By default, adios2 BP engines write to disk (flush) at EndStep or Close. Flush in adios2 was designed for when buffer memory is managed by the user for many steps, in the default case is not required as adios2 is "step-based" I/O. Hope it helps.

@pnorbert
Copy link
Contributor

How do you know that step 'n' was supposed to be written before the application dies? That is that end_step was called n times not n-1 times?

Just checking: steps start from 0 when reading, so if your routine had a loop i = 1, n, then at reading you refer to the last written as n-1. But bpls should report arrays with n*{...}.

@hr87
Copy link
Author

hr87 commented Feb 17, 2022

      ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      ! select the time step to read
      ! currently only selects the first step in the file
      ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      subroutine select_step(step)

          ! type definitions
          use ifp_types, only: standard
          ! io routines
          use ifp_io, only: message, &
                            to_str
          ! helper functions
          use adios_helper, only: check_return_value

          ! variable declaration
          implicit none

          ! index of selected step, -1 for last
          integer(standard), optional, value  :: step

          ! check if step is present
          if (.not. present(step)) then
              ! assign default value
              step = -1
          end if ! present
          ! check for last step
          if (step == -1) then
              ! assign default value
              step = int(number_steps_in_file, standard)
          end if ! last step

          ! assign step index to module variable, we need to convert to a long, 0 based indexing
          read_step_index = int(step, 8) - 1

          ! write step selection
          call message('Reading step '//to_str(read_step_index + 1)//' out of '//to_str(number_steps_in_file))

      end subroutine select_step

This is how I select the last step, and the number of steps I read with here

         ! check for config file
         if (config_file /= '') then
             ! init ADIOS 2 for serial, with a config file
             call adios2_init(adios, config_file, debug, adios_return)
         else
             ! init adios for serial execution, without config file
             call adios2_init(adios, debug, adios_return)
         end if ! config file

         ! check for errors
         call check_return_value(adios_return, 'init_adios_reader:init')

         ! define IO object
         call adios2_declare_io(adios_io, adios, 'lo_system_reader', adios_return)
         ! check for errors
         call check_return_value(adios_return, 'init_adios_reader:declare_io')

         ! open file for reading
         call adios2_open(adios_engine, adios_io, filename, adios2_mode_read, adios_return)
         ! check for errors
         call check_return_value(adios_return, 'init_adios_reader:open: '//trim(filename))

         ! get number of steps in file
         call adios2_steps(number_steps_in_file, adios_engine, adios_return)
         ! check for errors
         call check_return_value(adios_return, 'init_adios_reader:open: '//trim(filename))

         ! write file status
         call message('Reading adios file '//trim(filename) &
                      //' ('//to_str(number_steps_in_file)//' steps in file)')

         ! set step to be read
         call select_step(-1)

I currently can't reproduce the error, but I get a message that the variable I then try to read does not have as many steps has required to read the last step. And if I look with bpls, all variables have only n-1 timesteps, even though the code above tries to read timestep n.

@hr87
Copy link
Author

hr87 commented Feb 17, 2022

@williamfgc with flush, I get a buffer overflow, without flush I have problems that the file might not be written consistently, and data is missing.

@williamfgc
Copy link
Contributor

without flush I have problems that the file might not be written consistently, and data is missing.

@hr87 looking at the code it looks like you're using BP4 (default) and calling flush right before close and right after end_step. These functions call the file system internally, so there shouldn't be any difference in principle with or without flush (hence my request). I'd try an incremental approach before jumping into flush related functionality.

@hr87
Copy link
Author

hr87 commented Feb 17, 2022

Here is the error message:

Reading adios file ifp.out (8 steps in file)
Reading step 7 out of 7

ERROR: steps start 7 from SetStepsSelection or BeginStep is larger than the maximum available step 5 for variable mesh/size_x, in call to Get

And this is how I read in the number of steps (first line seen in output above)

call adios2_steps(number_steps_in_file, adios_engine, adios_return)

and then I try reading step n-1 (second line)

It seems to happen when I append to an already existing file.

@williamfgc
Copy link
Contributor

@hr87 that seems to be associated to one variable. Is it intended to be present in all steps? Does it happen with other variables? Please follow up on the flush issue as it's strange. Thanks!

@pnorbert
Copy link
Contributor

I still do not understand the difference with and without close. Can you please send the bpls -la and bpls -lat output of the file when using close and when not.

@pnorbert
Copy link
Contributor

As @williamfgc noted, each variable has its own steps, so the logic in your modules only works if every variable is written in every step.

This is confusing we realize. The adios2_steps() returns the number of steps N written to a file but then the adios2_set_step_selection() is for an individual variable which has K steps where K <= N.

@pnorbert
Copy link
Contributor

Reading adios file ifp.out (8 steps in file)
Reading step 7 out of 7

Just a curiosity: why does number_steps_in_file change from 8 to 7 in this example?

@hr87
Copy link
Author

hr87 commented Feb 18, 2022

@hr87 that seems to be associated to one variable. Is it intended to be present in all steps? Does it happen with other variables? Please follow up on the flush issue as it's strange. Thanks!

It is the first variable I read. So that is where the reader already freaks out.

For the flush issue. It writes the file without flush. With flush I get a buffer overflow. But I think it is not the actual flush that causes the buffer overflow, but writing a new step after the flush.

@hr87
Copy link
Author

hr87 commented Feb 18, 2022

Reading adios file ifp.out (8 steps in file)
Reading step 7 out of 7

Just a curiosity: why does number_steps_in_file change from 8 to 7 in this example?

@pnorbert Because of zero indexing: 8 steps means the last step has index 7.

As @williamfgc noted, each variable has its own steps, so the logic in your modules only works if every variable is written in every step.

This is confusing we realize. The adios2_steps() returns the number of steps N written to a file but then the adios2_set_step_selection() is for an individual variable which has K steps where K <= N.

Every variable is written in every step. But after some more debugging I figured out that the actual problem is, that I try to append to an already existing file. So the problem is actually:

  • Writing file for the first time => all good, all steps in
  • Read file for restart (open, read, close) => read last step N
  • Append new steps to file => now the step index in the file is N + 1, but all variables have only N values. No further steps are added to the file, even though the simulation and write continues perfectly fine.

bpls after initial run (should be 2 steps)

$ bpls ifp.out -la mesh/size_x
int32_t  mesh/size_x                         2*scalar = 256 / 256
$ bpls ifp.out -lat mesh/size_x
Step 0:
  int32_t  mesh/size_x                         scalar
Step 1:
  int32_t  mesh/size_x                         scalar

After restart (should be 6 steps)

$ bpls ifp.out -la mesh/size_x
int32_t  mesh/size_x                         2*scalar = 256 / 256
$ bpls ifp.out -lat mesh/size_x
Step 0:
  int32_t  mesh/size_x                         scalar
Step 1:
  int32_t  mesh/size_x                         scalar
Step 2:

The last line is not a copy error, it is actually empty. However, as I mentioned above, the writer happily wrote 4 additional steps after restarting, no error or warning. They just don't show up.

@pnorbert
Copy link
Contributor

Can you try the master branch? Append should properly work for BP4 in there.

@hr87
Copy link
Author

hr87 commented Feb 21, 2022

Can you try the master branch? Append should properly work for BP4 in there.

Same result. The error message looks different now but I still get the same error and output with bpls:

[Mon Feb 21 10:36:58 2022] [ADIOS2 ERROR] : adios2_get: [Mon Feb 21 10:36:58 2022] [ADIOS2 EXCEPTION] format::bp::BP4Deserializer : steps start 3 from SetStepsSelection or BeginStep is larger than the maximum available step 2 for variable mesh/size_x, in call to Get

@pnorbert
Copy link
Contributor

I don't have more ideas what goes wrong. Can you share your code that we can build and run? The module inserted here has missing module dependencies.

@hr87
Copy link
Author

hr87 commented Feb 22, 2022

@pnorbert I will try to create a working example that shows this behavior

@hr87
Copy link
Author

hr87 commented Mar 14, 2022

@pnorbert I am still working on an example that shows this behavior. My test program so far does not have the bug. One thing I noticed is, when I try to read the file with both bpls -t -d or using Python and trying to iterate over all steps, both programs will hang after reading the last step. If I limit the number of steps to the actual number of steps, this does not happen. It seems that somehow an unfinished step is appended to the file, throwing everything off.

@pnorbert
Copy link
Contributor

The file is probably "active", that is, it was not closed properly.

bpls -V <file> will tell you if that is the case, being "Active" the last word on the line.

adios2_deactivate_bp <file> should turn off the magic byte to indicate that no more steps are expected from the writer.

@hr87
Copy link
Author

hr87 commented Mar 14, 2022

Yes, the file is marked as active. Can this also explain the restart problems? Trying to append to an active file? I will put in more debug output to see if all routines that finishes steps, closes the files are called as expected.

@pnorbert
Copy link
Contributor

No, appending should be fine regardless of active status.

@hr87
Copy link
Author

hr87 commented Jul 20, 2022

Hey @pnorbert

Sorry, it took me a while, but I managed to get a minimal working example. I tracked the problem down to the append functionality. When I try to append to an already existing file, the variable selection is completely messed up. The example simple creates a new file, if the output file does not exist, otherwise will append. It should write the same values every time, where each rank should write 10 values. The first time it works fine. The second time, with append, the 2nd rank does not write in the right spot, even though nothing changed in the variable selection.

Here is the bpls output. The second timestep was writting via append. The hundred value is the rank number, the rest is the running index on the rank.

Step 0:
  double   var   {20}
    ( 0)    101 102 103 104 105 106
    ( 6)    107 108 109 110 201 202
    (12)    203 204 205 206 207 208
    (18)    209 210

Step 1:
  double   var   {20}
    ( 0)    101 102 103 104 105 106
    ( 6)    107 108 109 110 3.75559e-308 1.31916e-320
    (12)    6.61495e-318 8.64522e-316 4.03262e-313 1.3241e-321 2.08556e-317 1.93414e+141
    (18)    201 202

Here is the code. I use inquire to normally get the variable from adios by name in the code. Not sure, if that affects anything. I also tried it with dynamic selection and reset the selection, but that fails too (set static = .false.).

! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! function to check for an error in return value
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine check_return_value(adios_return, function_name)

    ! variable declaration
    implicit none

    ! adios return value
    integer, intent(in) :: adios_return
    ! message string
    character(len=*), intent(in) :: function_name

    select case (adios_return)
    case (0)
        ! all good, no error

    case (1)
        ! print error message and exit
        print *, function_name, 'ADIOS Invalid argument'

    case (2)
        ! print error message and exit
        print *, function_name, 'ADIOS system error'

    case (3)
        ! print error message and exit
        print *, function_name, 'ADIOS runtime error'

    case (4)
        ! print error message and exit
        print *, function_name, 'ADIOS exception'

    case default
        ! print error message and exit
        print *, function_name, 'Undefined ADIOS return value'

    end select ! adios_return

end subroutine check_return_value

! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! main function
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program test

    use mpi, only: mpi_init, &
                   mpi_finalize, &
                   mpi_comm_size, &
                   mpi_comm_rank, &
                   MPI_COMM_WORLD

    use adios2

    ! flag to use static variable size
    logical, parameter :: static = .true.

    ! adios object
    type(adios2_adios) :: adios
    ! I/O object
    type(adios2_io) :: adios_io
    ! engine object
    type(adios2_engine) :: adios_engine
    ! adios variable handle
    type(adios2_variable) :: variable, variable2
    ! variable name
    character(len=*), parameter :: variable_name = 'var'

    ! filename
    character(len=*), parameter :: filename = 'test.out'
    ! flag to append to file
    logical :: append
    ! open mode, append or create new
    integer :: write_mode
    ! mpi error code
    integer :: mpi_error
    ! adios return value
    integer :: adios_return

    ! number of mpi ranks
    integer :: comm_size
    ! current mpi rank index
    integer :: rank
    ! field information
    integer(8) :: global_size_x, global_offset, size_x
    ! values
    real(8), dimension(:), allocatable :: values
    ! loop variable
    integer :: i

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! init

    ! initialize MPI interface
    call mpi_init(mpi_error)

    ! if the output file exists, if so, append, else write new
    inquire (file=filename, exist=append)

    ! init ADIOS 2 for MPI execution, without config file
    call adios2_init(adios, MPI_COMM_WORLD, .true., adios_return)
    ! check for errors
    call check_return_value(adios_return, 'init_adios:init')

    ! define IO object
    call adios2_declare_io(adios_io, adios, 'test', adios_return)
    ! check for errors
    call check_return_value(adios_return, 'init_adios:declare_io')

    ! check if we create a new file
    if (append) then
        ! set mode to write, which overwrites or creates a new file
        write_mode = adios2_mode_append
    else
        ! set mode to append
        write_mode = adios2_mode_write
    end if ! new file

    ! open file with engine
    call adios2_open(adios_engine, adios_io, filename, write_mode, adios_return)
    ! check for error
    call check_return_value(adios_return, 'init_adios:open')

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! declare variables

    ! get comm size
    call mpi_comm_size(MPI_COMM_WORLD, comm_size, mpi_error)
    ! get current rank
    call mpi_comm_rank(MPI_COMM_WORLD, rank, mpi_error)

    ! set local size
    size_x = 10
    ! calculate global size
    global_size_x = comm_size * size_x
    ! calculate start position
    global_offset = rank * size_x

    ! create variable
    call adios2_define_variable(variable, adios_io, variable_name, adios2_type_dp, &
                                1, [global_size_x], [global_offset], [size_x], &
                                static, adios_return)
    ! check for errors
    call check_return_value(adios_return, 'declare_cell_variable: '//variable_name)

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! step

    ! create array
    allocate (values(size_x))

    ! loop over all entries
    do i = 1, size_x
        ! calculate value based on rank and element
        values(i) = 100 * (rank + 1) + i

    end do ! i

    ! begin a new time step
    call adios2_begin_step(adios_engine, adios2_step_mode_append, adios_return)
    ! check for errors
    call check_return_value(adios_return, 'begin_adios_step:begin')

    ! get adios variable from io object
    call adios2_inquire_variable(variable, adios_io, variable_name, adios_return)
    ! check for errors
    call check_return_value(adios_return, 'write_cell_variable:get_variable '//variable_name)

    if (.not. static) then
        ! set the selection of the variable to be read
        call adios2_set_selection(variable, 1, [global_offset], [size_x], adios_return)
        ! check return value
        call check_return_value(adios_return, 'read_cell_variable:set_selection '//variable_name)

    end if ! static

    ! submit values for writting
    call adios2_put(adios_engine, variable, values, adios2_mode_deferred, adios_return)
    ! check for errors
    call check_return_value(adios_return, 'write_cell_variable:put'//variable_name)

    ! begin a new time step
    call adios2_end_step(adios_engine, adios_return)
    ! check for errors
    call check_return_value(adios_return, 'finish_adios_step:finish')

    ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! shutdown

    ! flush everything to disk
    call adios2_flush_all(adios, adios_return)
    ! check for errors
    call check_return_value(adios_return, 'finalize_adios:flush')

    ! close engine
    call adios2_close(adios_engine, adios_return)
    ! check for errors
    call check_return_value(adios_return, 'finalize_adios:close')

    ! finalize adios object
    call adios2_finalize(adios, adios_return)
    ! check for errors
    call check_return_value(adios_return, 'finalize_adios')

    ! shut down MPI
    call mpi_finalize(mpi_error)

end program test

@pnorbert
Copy link
Contributor

Hi Hans, thanks for the example. It indeed fails to work properly with 2.7.1.436.
However, it works as expected with 2.8 and master, with both BP4 and BP5 file formats and engine.
Can you try the latest release and upgrade if it works for you?

@hr87
Copy link
Author

hr87 commented Jul 20, 2022

Yes, updated fixed this problem. Thanks for the help.

@hr87 hr87 closed this as completed Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants