API Change request: replace Mode:Read in Open with two new modes #1384

pnorbert · 2019-04-25T15:44:05Z

Request: Have two separate and explicit modes for reading step-by-step as a stream and for reading arbitrary step(s) in post-processing. Instead of
io.Open(fname, adios2::Mode::Read);
have
io.Open(fname, adios2::Mode::ReadStepByStep);
io.Open(fname, adios2::Mode::ReadAsFile);
(names are debatable)

In the high level python API we replace "r" for "rs" and "rf" and an error on "r" will explain the options.

Issue: We want to implement proper streaming through file with the BP4 format. We want to keep in memory the metadata only about a limited number of steps. File-based engines right now must process all metadata and present all steps at Open() time in case the read code uses SetStepSelection() and does not use BeginStep(). Only if the code calls BeginStep(), do we know that it is now doing step-by-step reading.

Reasoning: We always had this conflict to support the special case of reading arbitrary steps which only works in file-based engines. The programmer must decide in the beginning of coding, which way to go. Either read step-by-step, or read arbitrary steps. The source code will be very different and there is no going back and forth. So it seems fine to force the programmer to tell ADIOS, which way the code goes.

The separate modes will allow us to write two separate reader engine classes for the BP4 engine and choose between them in the IO class at Open(). The step-by-step reader then can keep the metadata consumption down.

Options: we thought about different options:

engine parameter set in source or XML to differentiate
different engine names to differentiate
two separate OpenFile() and OpenStream() functions
two modes in Open()

All of these would achieve the same but the first two seems just to be more cumbersome for use plus its an option for the user, while the decision has to be made by the developer. The third is less elegant but very much equivalent: force the developer to tell ADIOS what is going on.

Let us know if you agree or disagree.

germasch · 2019-04-25T16:37:24Z

I have one side comment: If you're changing the API with regards to those flags, you might as well use the opportunity to separate the sync/deferred ("launch") flags from the "open" flags:

/** OpenMode in IO Open */
enum class Mode
{
    Undefined,
    // open modes
    Write,
    Read,
    Append,
    // launch execution modes
    Sync,
    Deferred
};

For example, have a separate OpenMode::{Write,ReadWhatever,Append}. Right now, it looks like you can pass Mode::Sync to Open(), which doesn't make a lot of sense, though.

germasch · 2019-04-25T16:50:19Z

On the main issue, let's say the proposed solution sounds kind of "kludgy" to me, though I also don't have a good understanding of what all the different engines do. But it kind of sounds like you ask the user to decide what they're going to access at open time to avoid the complexity of implementing the corresponding logic inside of ADIOS2.

It would help if it was actually documented what the goals with BP4 are. I gather, you want to avoid having to read all the metadata (on all procs?), which certainly seems worthwhile. But I don't necessarily see why you need to know at open time, you could just read metadata lazily, ie, read a piece of info only when it's needed.

The programmer must decide in the beginning of coding, which way to go. Either read step-by-step, or read arbitrary steps. The source code will be very different and there is no going back and forth.

Can you explain why this is? For an application, it doesn't really sounds like there's a difference between "read the first step, then the next step" to "read step 5, then read step 10", so I'm not sure why you want to force that?

A maybe somewhat related point is that there seem to be many different ways of handling steps even in streaming applications, e.g., "read latest step, drop previous ones" vs "read step by step even if it blocks the writer", etc. I could well imagine a scenario where even in the streaming context, more than exactly one step might be accessible on the reader side, too. (That can be useful to calculate deltas during in-situ analysis). The point here is just that going to a model which has exactly two options might well not be future proof, and adding new open modes probably isn't a good way to handle all these different scenarios.

pnorbert · 2019-04-25T18:01:53Z

Thank you Kai for your observations. The lazy metadata reading got me and @wfgodoy thinking and we figured that there is a way to do this, with more interactions between the IO and Engine class. So we put this change request on hold until we figure which way to go. We were distracted by thinking of the most complicated cases, where variables and attributes appear at any step, their dimensions may change at any step, and we need to process the entire metadata of all steps in Open to allow the user to call IO.InquireVariable() or Engine.AllStepsBlockInfo(). We can postpone that processing to the actual call that uses it and Open does not need to process any metadata. On other note, since you asked what the difference is between the two modes: the source code written with in mind that all steps are accessible at all times (i.e. traditional file reading) cannot be used in step-by-step reading. So you cannot use the majority of engines anyway. Look for the Heat Transfer example in adiosvm repo, the fortran analysis code ( heatAnalysis_adios2_*.F90): https://github.com/pnorbert/adiosvm/tree/master/Tutorial/heat2d/fortran/analysis The file version of the code cannot run in streaming mode. And this is s kinda-similar-to-streaming example because it reads the variable data step by step. If you want, rewrite it to read the entire dataset in once (into a 3D array) and loop over its last dimension in memory. So in contrast to your opinion, the one Read mode approach has always seemed "kludgy" to me, because it is used for two orthogonal coding approach (file based with all steps available vs. step-by-step-only-forward) and the user must decide how they plan to access the data at open time. Once you inquire a variable first, begin step will throw an exception. Once you do begin step, SetStepSelection() will throw an exception. There is no jumping back and forth once you did your first move. So in the end we go out of our way to implement more complicated logic in the file engines instead of having two of each (for reading) which do one thing simple. But this how it was decided in the beginning of ADIOS2, since we did not like the ADIOS1 approach of having two separate Open functions either.

…

On Thu, Apr 25, 2019 at 12:50 PM Kai Germaschewski ***@***.***> wrote: On the main issue, let's say the proposed solution sounds kind of "kludgy" to me, though I also don't have a good understanding of what all the different engines do. But it kind of sounds like you ask the user to decide what they're going to access at open time to avoid the complexity of implementing the corresponding logic inside of ADIOS2. It would help if it was actually documented what the goals with BP4 are. I gather, you want to avoid having to read all the metadata (on all procs?), which certainly seems worthwhile. But I don't necessarily see why you need to know at open time, you could just read metadata lazily, ie, read a piece of info only when it's needed. The programmer must decide in the beginning of coding, which way to go. Either read step-by-step, or read arbitrary steps. The source code will be very different and there is no going back and forth. Can you explain why this is? For an application, it doesn't really sounds like there's a difference between "read the first step, then the next step" to "read step 5, then read step 10", so I'm not sure why you want to force that? A maybe somewhat related point is that there seem to be many different ways of handling steps even in streaming applications, e.g., "read latest step, drop previous ones" vs "read step by step even if it blocks the writer", etc. I could well imagine a scenario where even in the streaming context, more than exactly one step might be accessible on the reader side, too. (That can be useful to calculate deltas during in-situ analysis). The point here is just that going to a model which has exactly two options might well not be future proof, and adding new open modes probably isn't a good way to handle all these different scenarios. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1384 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAYYYLITY7KUTNZ3WSGILR3PSHOMXANCNFSM4HIOMMVQ> .

germasch · 2019-04-26T01:33:02Z

Thank you Kai for your observations. The lazy metadata reading got me and @wfgodoy thinking and we figured that there is a way to do this, with more interactions between the IO and Engine class. So we put this change request on hold until we figure which way to go. We were distracted by thinking of the most complicated cases, where variables and attributes appear at any step, their dimensions may change at any step, and we need to process the entire metadata of all steps in Open to allow the user to call IO.InquireVariable() or Engine.AllStepsBlockInfo(). We can postpone that processing to the actual call that uses it and Open does not need to process any metadata.

I'd say that generally sounds preferable.

On other note, since you asked what the difference is between the two modes: the source code written with in mind that all steps are accessible at all times (i.e. traditional file reading) cannot be used in step-by-step reading. So you cannot use the majority of engines anyway. Look for the Heat Transfer example in adiosvm repo, the fortran analysis code ( heatAnalysis_adios2_*.F90): https://github.com/pnorbert/adiosvm/tree/master/Tutorial/heat2d/fortran/analysis The file version of the code cannot run in streaming mode.

I can see that, though I guess I'm not sure I'd say that's desirable. So I guess I have kind of the picture of a music player in my mind. You can say "play", "pause", "next song". If it's a CD player, you can also directly say you want song 17, and it can show you how many songs there are. A streaming music player may have "skip to next song" button, maybe a "skip back" button. For a general music player API, I'd expect it to have an API that let's me do "skip back" or "select song # " functions. And if the underlying engine doesn't allow that, I'd get a "not supported" error back. But I wouldn't expect to have to write my code in a way where I have o know whether it's a CD player or a streaming player if I just wanted to do "open" and "play".

So I think a lot of the complexity of the issue is related to the fact that you have SetStepSelection on a per-variable basis, and the fact that it includes a count. That leads to the situation where in a file, you can inquire a variable, but then you can't directly "Get" from it, because it doesn't exist for the current step. OTOH, if you're streaming, you would have the InquireVariable fail until you actually get to the step that has it. I think these kind of subtle differences between file and streaming based I/O aren't really desirable considering that you're trying to abstract away whether the I/O is done on a file or a stream in ADIOS2. Ie., in theory you can just change the xml file to switch from one to the other, which is nice. But then if the app actually fails anyway, because of such differences which require the app code to be changed, that makes the XML runtime config much less powerful.

Obviously, you can't abstract away all differences between things that are fundamentally different. But I'd say it'd be good to minimize them. So, to stay closer to my music player example, say you had a general "SetStep" function in addition to "BeginStep". Then I think you could make both of these work, on both files and streams, and I'd say that would make life easier.

for (int i = 0; i < 5; i++) {
  io.SetStep(i);
  var.Get(...);
}

or

for (int i = 0; i < 5; i++) {
  io.BeginStep(i);
  var.Get(...);
  io.EndStep();
}

On the other hand, if you have a stream and you say SetStep(5) while you're currently at step 10, you might have to return some kind of "not supported". You can think about what you'd want to do if you say SetStep(10) while you're currently at step 5 -- the options would be to wait until step 10 arrives in the stream, or you could say "not supported". (I think I'd prefer the former, because it mirrors what happens with a file).

Now, this is all much more complicated due to the step selection being per variable. I'm not sure how useful that is -- is there really anyone who needs to simultaneously get the value of one variable at step 5, and another one at step 10? Ie., wouldn't this be clearer, anyway:

io.SetStep(5).
var1.Get(...);
io.SetStep(10)
var2.Get(...);

And this is a kinda-similar-to-streaming example because it reads the variable data step by step. If you want, rewrite it to read the entire dataset in once (into a 3D array) and loop over its last dimension in memory. So in contrast to your opinion, the one Read mode approach has always seemed "kludgy" to me, because it is used for two orthogonal coding approach (file based with all steps available vs. step-by-step-only-forward) and the user must decide how they plan to access the data at open time. Once you inquire a variable first, begin step will throw an exception. Once you do begin step, SetStepSelection() will throw an exception.

Well, I guess I'll agree that what you describe is kludgy. I've seen only one example where SetStepSelection is used with a count != 1. If you want to support that, that's just not really possible with a stream (well, it might be possible, but not easily/efficiently). I'd say I'm not sure SetStepSelection with a count != 1 is all that useful -- at least, the same can be achieved by a loop getting (a slice of) a variable at a time, and I don't think the performance would be much worse in that case. But anyway, you could support that on files only and have it fail when used on a stream. OTOH, reading data step by step is perfectly reasonable to support for both files and streams, transparently to the user.

There is no jumping back and forth once you did your first move.

Right, that may be true, but I'm not sure it needs to be that way. I mean even in the streaming case, why shouldn't you be able to set the selection to the current step? Or, why in the file case prohibit reading it step-by-step with the streaming interface? I guess I agree that if you have two totally different ways of accessing files / streams, different open functions are appropriate, in fact I might even say that they should return different things, a stream reader vs a file reader where one has certain apis and the other doesn't.

But still, since file read is basically like stream read while allowing some additional operations, I'd rather see one unified interface in as far as possible, and an error if you do something on a stream that's only supported on a file.

pnorbert · 2019-04-26T12:54:58Z

Kai, We have a proper API for I/O and you can freely switch between engines to read from a stream or a file with the same code. This includes BeginStep/EndStep and Put/Get. You are encouraged to write your code this way and then you can have what you wanted. We also have an extension that only applies to files where all steps are available at once. The extension allows for reading arbitrary steps or multiple steps of a variable. This is for codes that need random access to the entire dataset. There is nothing here that we intend to support in streams. Codes using this feature will never work in situ. Hence I was asking for modes like ReadStepByStep and ReadPostProcessing, not ReadStream vs ReadFile.

…

On Thu, Apr 25, 2019 at 9:33 PM Kai Germaschewski ***@***.***> wrote: Thank you Kai for your observations. The lazy metadata reading got me and @wfgodoy thinking and we figured that there is a way to do this, with more interactions between the IO and Engine class. So we put this change request on hold until we figure which way to go. We were distracted by thinking of the most complicated cases, where variables and attributes appear at any step, their dimensions may change at any step, and we need to process the entire metadata of all steps in Open to allow the user to call IO.InquireVariable() or Engine.AllStepsBlockInfo(). We can postpone that processing to the actual call that uses it and Open does not need to process any metadata. I'd say that generally sounds preferable. On other note, since you asked what the difference is between the two modes: the source code written with in mind that all steps are accessible at all times (i.e. traditional file reading) cannot be used in step-by-step reading. So you cannot use the majority of engines anyway. Look for the Heat Transfer example in adiosvm repo, the fortran analysis code ( heatAnalysis_adios2_*.F90): https://github.com/pnorbert/adiosvm/tree/master/Tutorial/heat2d/fortran/analysis The file version of the code cannot run in streaming mode. I can see that, though I guess I'm not sure I'd say that's desirable. So I guess I have kind of the picture of a music player in my mind. You can say "play", "pause", "next song". If it's a CD player, you can also directly say you want song 17, and it can show you how many songs there are. A streaming music player may have "skip to next song" button, maybe a "skip back" button. For a general music player API, I'd expect it to have an API that let's me do "skip back" or "select song # " functions. And if the underlying engine doesn't allow that, I'd get a "not supported" error back. But I wouldn't expect to have to write my code in a way where I have o know whether it's a CD player or a streaming player if I just wanted to do "open" and "play". So I think a lot of the complexity of the issue is related to the fact that you have SetStepSelection on a per-variable basis, and the fact that it includes a count. That leads to the situation where in a file, you can inquire a variable, but then you can't directly "Get" from it, because it doesn't exist for the current step. OTOH, if you're streaming, you would have the InquireVariable fail until you actually get to the step that has it. I think these kind of subtle differences between file and streaming based I/O aren't really desirable considering that you're trying to abstract away whether the I/O is done on a file or a stream in ADIOS2. Ie., in theory you can just change the xml file to switch from one to the other, which is nice. But then if the app actually fails anyway, because of such differences which require the app code to be changed, that makes the XML runtime config much less powerful. Obviously, you can't abstract away all differences between things that are fundamentally different. But I'd say it'd be good to minimize them. So, to stay closer to my music player example, say you had a general "SetStep" function in addition to "BeginStep". Then I think you could make both of these work, on both files and streams, and I'd say that would make life easier. for (int i = 0; i < 5; i++) { io.SetStep(i); var.Get(...); } or for (int i = 0; i < 5; i++) { io.BeginStep(i); var.Get(...); io.EndStep(); } On the other hand, if you have a stream and you say SetStep(5) while you're currently at step 10, you might have to return some kind of "not supported". You can think about what you'd want to do if you say SetStep(10) while you're currently at step 5 -- the options would be to wait until step 10 arrives in the stream, or you could say "not supported". (I think I'd prefer the former, because it mirrors what happens with a file). Now, this is all much more complicated due to the step selection being per variable. I'm not sure how useful that is -- is there really anyone who needs to simultaneously get the value of one variable at step 5, and another one at step 10? Ie., wouldn't this be clearer, anyway: io.SetStep(5). var1.Get(...); io.SetStep(10) var2.Get(...); And this is a kinda-similar-to-streaming example because it reads the variable data step by step. If you want, rewrite it to read the entire dataset in once (into a 3D array) and loop over its last dimension in memory. So in contrast to your opinion, the one Read mode approach has always seemed "kludgy" to me, because it is used for two orthogonal coding approach (file based with all steps available vs. step-by-step-only-forward) and the user must decide how they plan to access the data at open time. Once you inquire a variable first, begin step will throw an exception. Once you do begin step, SetStepSelection() will throw an exception. Well, I guess I'll agree that what you describe is kludgy. I've seen only one example where SetStepSelection is used with a count != 1. If you want to support that, that's just not really possible with a stream (well, it might be possible, but not easily/efficiently). I'd say I'm not sure SetStepSelection with a count != 1 is all that useful -- at least, the same can be achieved by a loop getting (a slice of) a variable at a time, and I don't think the performance would be much worse in that case. But anyway, you could support that on files only and have it fail when used on a stream. OTOH, reading data step by step is perfectly reasonable to support for both files and streams, transparently to the user. There is no jumping back and forth once you did your first move. Right, that may be true, but I'm not sure it needs to be that way. I mean even in the streaming case, why shouldn't you be able to set the selection to the current step? Or, why in the file case prohibit reading it step-by-step with the streaming interface? I guess I agree that if you have two totally different ways of accessing files / streams, different open functions are appropriate, in fact I might even say that they should return different things, a stream reader vs a file reader where one has certain apis and the other doesn't. But still, since file read is basically like stream read while allowing some additional operations, I'd rather see one unified interface in as far as possible, and an error if you do something on a stream that's only supported on a file. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1384 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAYYYLPRHZ3NLVE6IY7NGOLPSJLU7ANCNFSM4HIOMMVQ> .

germasch · 2019-04-26T15:29:22Z

Okay, so having BeginStep/EndStep work the same for both files and streams is good, and in fact that's what I was arguing for (ie, have the usual case work transparently, no matter what happens underneath). I think I misunderstood your previous message, I thought you were saying once you open as a file, you'd have to use SetStepSelection and couldn't use BeginStep/EndStep.

I do think, though, that there still are subtle differences when using the "streaming" interface depending on the underlying file vs stream, even if random access is not used. In trying to look into this, I hit two other bugs, so I'll file those separately first.
[As a side note, I've mainly just used adios2 without step support at all, as I prefer to get separate output files per timestep, which are already unpleasantly large, rather than one huge file, so I don't have much experience with Begin/EndStep or SetStepSelection].

For the particular issue a hand, if you do think the user needs to tell you what they're going to do at open time, I'd prefer to keep "Mode::Read" but change the meaning to "open in streaming mode", and add "Mode::ReadRandomAccess" or something like that which would work only on files but not on streams, and then support SetStepSelection etc. I haven't tried, but I think the vast majority of tests and examples would require no change in that case.

We also have an extension that only applies to files where all steps are
available at once. The extension allows for reading arbitrary steps or
multiple steps of a variable. This is for codes that need random access to
the entire dataset. There is nothing here that we intend to support in
streams. Codes using this feature will never work in situ.

Sure, random access isn't compatible with streaming. But there are cases in between simple sequential access and entirely random access. For example, in post processing one might have code which calculates diagnostics from only every 5th step. If one wants to change that processing into in-situ diagnostics, it'd be nice if one could express the same thing directly rather than some kind of for (int i = 0; i < 4; i++) { reader.BeginStep(); reader.EndStep(); } // drop 4 steps in between. There is some analogy with C++ iterators, ie., there are cases in between InputIterator and RandomAccessIterator. I'm not saying anything should be done to that end at this time, but when thinking about APIs, I think one should realize that the future may bring things beyond what's currently supported.

I'll add one more thing: For code coupling or in-situ diagnostics things are likely deterministic (do every step, or every nth step). On the other hand, for a web status interface, the reader side might want to express something like "get me the data from the latest step available". I think that's supportable by a streaming interface, and I can well imagine that being useful in real life.

williamfgc · 2019-04-26T20:11:06Z

@germasch in principle there is nothing "RandomAccess" can't do that streaming (BeginStep/EndStep) can do. There is no in-between random and streaming modes, we either use one or the other (partial random access is already random access). To me, the biggest advantage or enforcing this separation and being mutually exclusive (either use BeginStep/EndStep or SetStepSelection, but not both) is that we completely separate portable and non-portable code across adios2 engines.

I agree with above that lazy evaluation is the way to go, thanks for bringing it up. This is done already in the Python and C++ high-level APIs, which support both modes BTW, open (or the constructor) doesn't do a physical open until the first write/read (similar to copy-on-write) and parameters can be set in between.

Lazy evaluation of Open will also allow setting a "Step View" into the metadata file for random access, today the default is a "all steps view".

germasch · 2019-04-26T20:48:00Z

So I guess one thing I can take away from #1387 is that, at least for me, that the semantics of when a variable exists / is valid / can be inquired legally are really rather subtle.

Thinking about this a bit more, I get a feeling that the need for the complicated interface / switching between streaming and random access mode is actually of how the core of adios2 has been designed with Variable being used as one object that serves (at least) two purposes, helping engines keep track of their internal state, ie, which variables exist in the file/stream, as well as underlying the user-facing API.

This is somewhat shown by the fact that if I avoid using cxx11::Variable on the user side, everything works just fine:

    for (int step = 0; step < 3; step++) {
      reader.BeginStep();
      reader.Get("x", x);
      reader.EndStep();
      printf("step %d: %g %g %g\n", step, x[0], x[1], x[2]);
    }

If "x" doesn't exist in a given step, I get an exception, and otherwise things work just fine, which is exactly what I would expect. Using cxx11::Variable, however, things depend on when I call InquireVariable, and what was a valid variable may change into an invalid one when I move on to the next step (well, I guess it should change into an invalid one, I don't think it actually does).

Let me mention some other seemingly unrelated issue which I hit on day 1 of using ADIOS2, and which I still hit occasionally, though by now I recognize it quickly. It's kinda obvious in this snippet:

    auto var = m_Io.DefineVariable<double>("x", {3});
    adios2::Engine reader = m_Io.Open("xxx.bp", adios2::Mode::Read);

This throws C++ exception with description "ERROR: variable x exists in IO object CXX11_API_TestIO, in call to DefineVariable in the line where Open is called. That confused me to no end, since when I called DefineVariable, there was no exception, and when I got the exception, I didn't call DefineVariable. Reading and writing variables is not symmetric, which was surprising (to me, anyway). In my checkpointing use of adios2, I have to work around this, as my code follows the same path for reading and writing, and all I want is to store Variable objects for the members, and I basically have to do a "try InquireVariable, if it fails DefineVariable", even though all I want to do is to say "I need a variable of type x named y".

So let me kick around an idea: Allow for cxx11::Variable to exist even if it doesn't exist in the file. Basically, the user-facing Variable is mostly just a string (and a type). It will be connected to the internal core::Variable if that exists. There's no need to have cxx11::Variable to have different behavior for streaming / random access then, you can inquire it before or after BeginStep (or even before Open). It's just a handle for a variable which may or may not exist. So cxx11::Variable is really just a handle for a "potential" Variable, which may become valid/invalid at different steps, or which might not exist in the file yet at all (as is the case in writing mode, anyway). If you're trying to "Get" the variable, throw an exception if it doesn't exist in the current step. If you're calling "SetStepSelection" on it and it doesn't exist in the engine in random access fashion, throw an exception then.

I haven't done this, and I haven't completely thought it through to the end. But I have a feeling a lot of the subtlety and differences between modes would go away on the user side.

williamfgc · 2019-04-27T01:12:58Z

Not sure if subtle, but https://adios2.readthedocs.io already covers many of the mentioned topics (Get("x",x) is a wrapper that calls InquireVariable ; Variable<T> var; is already a placeholder as it has an empty constructor, it also has a state between BeginStep/EndStep through the operator bool, APIs are symmetric as much as DefineVariable/InquireVariable, Put/Get, allow for a reversible IO object state at Write and Read, etc.). Feedback is always encouraged if something is not clear or is missing in the docs.

Also, I personally don't see anything complicated with Variable (or ADIOS2 besides the typical learning curve), it's just an object with a state. So far, my observation is that complications arise when: 1) ADIOS2 APIs are applied outside the current intended use-case scope reflected in tests/examples/tutorials/applications, and 2) bugs :)

germasch · 2019-04-27T03:40:16Z

Thanks for pointing me to the docs. I know what Get("x", x") does, that's what I used it in my example. The point is that when using this version of Get, it actually works as expected, as opposed to my original version with InquireVariable, which did not (#1387).

I seem to have trouble to get my point across, so I put some code where my mouth is. I'll put the code into a PR so you can look at it (though, while it demonstrates that the issue can be fixed quite easily, I did it rather hackily, at the wrong level of abstraction, etc, so it should definitely not be merged).

I first added this test, which assumes that xxx.bp contains the variable "x", but not for the first step.:

    adios2::Engine reader = m_Io.Open("xxx.bp", adios2::Mode::Read);

    std::array<double, 3> x;
    auto var = m_Io.InquireVariable<double>("x");
    for (int step = 0; step < 3; step++)
    {
        reader.BeginStep();
        if (step == 0)
        {
            EXPECT_FALSE(var);
            EXPECT_THROW(reader.Get(var, x.data()), std::invalid_argument);
        }
        else
        {
            EXPECT_TRUE(var);
            reader.Get(var, x.data(), adios2::Mode::Sync);
            EXPECT_EQ(
                x, (std::array<double, 3>{10. + step, 20. + step, 30. + step}));
        }
        reader.EndStep();
    }
    reader.Close();

It's essentially the same as in #1387, and it fails. The actual fix I introduced changes < 20 lines, and the test now passes. In addition, while I haven't actually tested it, I'm fairly sure that the above code will now also work if a streaming engine is used. As a side effect, it fixes some other not particularly critical bug where cxx11::Variable might be left with a dangling pointer to a core::Variable which has gone away. It does change behavior, though, but I think existing (working) code won't break.The tests on the PR will tell. It may well introduce new bugs, and it introduces some inefficiency. However, that's related to the fact that, as I said, this isn't the right fix at the right level -- it's not a fundamental limitation.

germasch · 2019-04-28T03:16:34Z

FWIW, the proof-of-concept change in #1391 did in fact cause only a little bit of breakage. One test failure comes from some case I didn't think through (now fixed), and the remaining three are related to what I think is existing "surprising" behavior at the least, which I documented in #1392.

I'd like to make the point that I think the behavior I'm proposing makes reading and writing more symmetric:

Existing behavior on (streaming or file) write:

var = DefineVariable() once (either before the BeginStep/EndStep loop, or the before the first time a variable is Put.
Then, at each step, Put the variable if so desired, or skip it. The var remains valid the entire way through.

Existing behavior on streaming read:

Can't do var = InquireVariable() once, have to do it each time after BeginStep.
Then call Get

Existing behavior on file read:

Can do var = InquireVariable() once, but then have to use SetStepSelection.
Then call Get.

My proposed behavior is to support the following for both file and streaming read (the previous two approaches continue to work, though):

Can do var = InquireVariable() once before step loop (or first time you want to Get within step loop)
Then call Get within step loop (or skip it). This call will fail (throw) if variable doesn't exist in current step.

So it's mostly backwards compatible and allows reading code to be written to mirror the writing code. There is one hitch, which is, how do you find out if a variable exists in the current step? It can be addressed in various ways, but I haven't done it.

None of this changes that the PR is just a demonstration that an API can be implemented that doesn't require distinguishing streaming from file mode, and that's largely backwards compable. It's still true that the way I've actually implemented is entirely "wrong", though.

@cyfdecyf

Code extracted from: https://github.com/pybind/pybind11.git at commit 80d452484c5409444b0ec19383faa84bb7a4d351 (v2.4.3). Upstream Shortlog ----------------- Ahuva Kroizer (1): 8f5b7fce FAQ addition (ornladios#1606) Alexander Gagarin (2): 0071a3fe Fix async Python functors invoking from multiple C++ threads (ornladios#1587) (ornladios#1595) b3bf248e Fix casting of time points with non-system-clock duration with VS (ornladios#1748) Allan Leal (1): e76dff77 Fix for Issue ornladios#1258 (ornladios#1298) Andre Schmeißer (1): 19189b4c Make `overload_cast_impl` available in C++11 mode. (ornladios#1581) Ansgar Burchardt (1): a22dd2d1 correct stride in matrix example and test Antony Lee (6): a303c6fc Remove spurious quote in error message. (ornladios#1202) 0826b3c1 Add spaces around "=" in signature repr. 8fbb5594 Clarify error_already_set documentation. 55dc1319 Clarify docs for functions taking bytes and not str. 58e551cc Properly report exceptions thrown during module initialization. baf6b990 Silence GCC8's -Wcast-function-type. (ornladios#1396) Axel Huebl (9): 4b84bad7 Fix Travis GCC 7 Python 3.6.6 (ornladios#1436) 97b20e53 CMake: Remember Python Version (ornladios#1434) 3a94561c Debug Builds: -DPy_DEBUG (ornladios#1438) 435dbdd1 add_module: allow include as SYSTEM (ornladios#1416) 9424d5d2 type_record: Uninit Member (ornladios#1658) 1c627c9e pybind11_getbuffer: useless safe nullptr check (ornladios#1664) a2cdd0b9 dict_readonly: member init (ornladios#1661) 000aabb2 Test: Numpy Scalar Creation (ornladios#1530) 38f408fc value_and_holder: uninit members (ornladios#1660) Baljak (1): 81da9888 Fix Intel C++ compiler warning on Windows (ornladios#1608) Blake Thompson (1): 30c03523 Added __contains__ to stl bindings for maps (ornladios#1767) Boris Dalstein (2): b30734ee Fix typo in doc: build-in -> built-in 96be2c15 Fix version mismatch typos in .travis.yml (ornladios#1948) Boris Staletic (4): 289e5d9c Implement an enum_ property "name" f4b4e2e9 Use new Doxygen archive URL - fixes Travis 0ca6867e Avoid Visual Studio 2017 15.9.4 ICE b3f0b4de new sphinx (ornladios#1786) Borja Zarco (2): e2b884c3 Use `PyGILState_GetThisThreadState` when using gil_scoped_acquire. (ornladios#1211) b2fdfd12 Avoid use of lambda to work around a clang bug. (ornladios#1883) Bruce Merry (2): 1e6172d4 Fix some minor mistakes in comments on struct instance 3b265787 Document using atexit for module destructors on PyPy (ornladios#1169) Chris Rusby (1): 22859bb8 Support more natural syntax for vector extend Christoph Kahl (1): 640b8fe6 fix ornladios#1406 add mingw compatibility (ornladios#1851) Dan (10): a175b21e Avoid decoding already-decoded strings from cindex. 4612db54 Try to autodetect the location of the clang standard libraries. b46bb64d Allow user to override default values of -x and -std=. a33212df Wrap the main functionality of mkdoc in a function. ede328a7 Allow writing output to file instead of stdout. a163f881 Delete partially-written file in the event of an error. 590e7ace Avoid storing global state. 2c8c5c4e Split into seperate functions for easier invocation from python. e0b8bbbc Use a file-local constant for non-prefixing nodes. 41f29ccd Parse command-line args in a separate function. Darius Arnold (1): 09330b94 Fix typos in documentation (ornladios#1635) David Caron (1): 307ea6b7 Typo Davis E. King (1): 9343e68b Fix cmake scripts so projects using CUDA .cu files build correctly. (ornladios#1441) Dean Moldovan (3): c10ac6cf Make it possible to generate constexpr signatures in C++11 mode 56613945 Use semi-constexpr signatures on MSVC 0aef6422 Simplify function signature annotation and parsing Dennis Luxen (1): 221fb1e1 Untangle cast logic to not implicitly require castability (ornladios#1442) Dmitry (1): 8f5a8ab4 Don't strip debug symbols in RelWithDebInfo mode (ornladios#1892) Elliott Sales de Andrade (1): 5e7591c6 Update PyPI URLs. Eric Cousineau (2): e9ca89f4 numpy: Add test for explicit dtype checks. At present, int64 + uint64 do not exactly match dtype(...).num 4a3464fd numpy: Provide concrete size aliases Francesco Biscani (1): ba33b2fc Add -Wdeprecated to test suite and fix associated warnings (ornladios#1191) François Becker (1): ce9d6e2c Fixed typo in classes.rst (ornladios#1388) Guilhem Saurel (2): e7ef34f2 compatibility with pytest 4.0, fix ornladios#1670 43a39bc7 ignore numpy.ufunc size warnings Henry Schreiner (10): 04b41f03 Upgrading to Xcode 9 & fix OSX/Py3 build failure cf0d0f9d Matching Python 2 int behavior on Python 2 (ornladios#1186) 6c62d279 Fix for conda failures on Windows ffd56ebe Fix pip issues on AppVeyor CI (ornladios#1369) 3789b4f9 Update C++ macros for C++17 and MSVC Z mode (ornladios#1347) 0f404a5d Allow recursive checkout without https (ornladios#1563) ae951ca0 CI fixes (ornladios#1744) 73b840dc Fix for issue in latest conda (ornladios#1757) 9bb33131 Fixing warnings about conversions in GCC 7+ (ornladios#1753) 047ce8c4 Fix iostream when used with nogil (ornladios#1368) Ian Bell (1): 502ffe50 Add docs and tests for unary op on class (ornladios#1814) Igor Socec (1): a301c5ad Dtype field ordering for NumPy 1.14 (ornladios#1837) Ivan Smirnov (1): d1db2ccf Make register_dtype() accept any field containers (ornladios#1225) Ivor Wanders (1): 2b045757 Improve documentation related to inheritance. (ornladios#1676) Jamie Snape (1): a0b8f70d Ensure PythonLibsNew_FOUND is set in FindPythonLibsNew module (ornladios#1373) Jason Rhinelander (31): c6a57c10 Fix dtype string leak d2757d04 Remove superfluous "requires_numpy" 64a99b92 Specify minimum needed cmake version in test suite 1b08df58 Fix `char &` arguments being non-bindable 7672292e Add informative compilation failure for method_adaptor failures 6a81dbbb Fix 2D Nx1/1xN inputs to eigen dense vector args a582d6c7 Build /permissive- under VS2017 835fa9bc Miscellaneous travis-ci updates/fixes 32ef69ac Qualify `cast_op_type` to help ICC 5c7a290d Fix new flake8 E741 error from using `l` variable 71178922 __qualname__ and nested class naming fixes (ornladios#1171) 086d53e8 Clean up eigen download code (and bump to 3.3.4) 3be401f2 Silence new MSVC C++17 deprecation warnings 48e1f9aa Fix premature destruction of args/kwargs arguments 367d723a Simplify arg copying b48d4a01 Added py::args ref counting tests 88efb251 Fixes for numpy 1.14.0 compatibility 507da418 Use a named rather than anon struct in instance 326deef2 Fix segfault when reloading interpreter with external modules (ornladios#1092) adbc8111 Use stricter brace initialization 657a51e8 Remove unnecessary `detail::` add56ccd MSVC workaround for broken `using detail::_` warning 431fc0e1 Fix numpy dtypes test on big-endian architectures 1ddfacba Fix for Python3 via brew e88656ab Improve macro type handling for types with commas 9f41c8ea Fix class name in overload failure message 6d0b4708 Reimplement version check and combine init macros 6862cb9b Add workaround for clang 3.3/3.4 e763f046 Base class destructor should be virtual f7bc18f5 Fix compatibility with catch v2 177713fa Fix gcc-8 compilation warning Jeff VanOss (3): 05d379a9 fix return from std::map bindings to __delitem__ (ornladios#1229) 01839dce remove duplicate feature from list (ornladios#1476) 77ef03d5 compile time check that properties have no py:arg values (ornladios#1524) Jeffrey Quesnelle (1): f93cd0aa PYBIND11_TLS_REPLACE_VALUE should use macro argument value in Python 3.7+ (ornladios#1683) Jeremy Maitin-Shepard (1): a3f4a0e8 Add support for __await__, __aiter__, and __anext__ protocols (ornladios#1842) Josh Kelley (1): 741576dd Update documentation for initialize_interpreter (ornladios#1584) Justin Bassett (1): 2cbafb05 fix detail::pythonbuf::overflow()'s return value to return not_eof(c) (ornladios#1479) Jörg Kreuzberger (1): 69dc380c ornladios#1208 Handle forced unwind exception (e.g. during pthread termination) Karl Haubenwallner (1): e9d6e879 Added a debug flag to the PYBIND11_INTERNALS_VERSION (ornladios#1549) Khachajantc Michael (1): e3cb2a67 Use std::addressof to obtain holder address instead of operator& Krzysztof Fornalczyk (1): 5c8746ff check for already existing enum value added; added test (ornladios#1453) Lori A. Burns (3): bdbe8d0b Enforces intel icpc >= 2017, fixes ornladios#1121 (ornladios#1363) 868d94fc Apply c++ standard flag only to files of CXX language. (ornladios#1678) f6c4c104 restores __invert__ to arithmetic-enabled enum, fixes ornladios#1907 (ornladios#1909) Maciek Starzyk (1): 9b028562 Update PyPI URLs Manuel Schneider (1): 492da592 another typo (ornladios#1675) Marc Schlaich (1): ab003dbd Correct VS version in FAQ Matthias Geier (1): 7bb1da96 fix copy-paste error: non-const -> const Michael Goulding (1): 77374a7e VS 15.8.0 Preview 4.0 has a bug with alias templates (ornladios#1462) Michał Wawrzyniec Urbańczyk (1): 978d439e Add PYBIND11_ prefix to the THROW macro to prevent name collisions. (ornladios#1578) Naotoshi Seo (1): 5ef1af13 Fix SEGV to create empty shaped numpy array (ornladios#1371) Nathan (1): 9b3fb053 Allow Windows.h min/max to coexist with pybind11 (ornladios#1847) Omar Awile (2): ac6cb91a Fixed small typo (ornladios#1633) 95f750a8 Add optional buffer size to pythonbuf::d_buffer constructor (ornladios#1687) Patrik Huber (1): 41a4fd8a Fix missing word typo Pauli Virtanen (1): c9d32a81 numpy: fix refcount leak to dtype singleton (ornladios#1860) Roland Dreier (2): 7a24bcf1 Fix malformed reST (ornladios#1802) 1aa8dd17 Fix assertion failure for unions (ornladios#1685) (ornladios#1709) Rune Paamand (2): 73634b6d Update iostream.h: Changed a local varname 'self' to 'self_' (ornladios#1535) 06d021b6 Issue ornladios#1532: Incompatible config options, /MP vs /Gm for MSVC in DEBUG (ornladios#1533) Ryota Suzuki (1): 1377fbf7 Fix unintentional escaping of character on Windows (ornladios#1574) (ornladios#1575) Samuel Debionne (2): 87fa6a43 Detect whether we are running in a Conda environment and adjust get_include() (ornladios#1877) 6ca312b3 Avoid infinite recursion in is_copy_constructible (ornladios#1910) Saran Tunyasuvunakool (2): b60fd233 Make sure `detail::get_internals` acquires the GIL before making Python calls. (ornladios#1836) bdf1a2cc In internals.h, only look at _DEBUG when compiling with MSVC. (ornladios#1855) Semen Yesylevskyy (1): ef13fb2e Info about inconsistent detection of Python version between pybind11 … (ornladios#1093) Sergei Izmailov (3): 979d75de doc: Add note about casting from `None` to `T*` (ornladios#1760) 09f08294 Avoid conversion to `int_` rhs argument of enum eq/ne (ornladios#1912) 6cb584e9 Adapt to python3.8 C API change (ornladios#1950) Sergei Lebedev (2): 08b0bda4 Added set::contains and generalized dict::contains (ornladios#1884) 046267c6 Added .empty() to all collection types (ornladios#1887) Stephen Larew (1): 5b4751af Add const to buffer:request() (ornladios#1890) Steven Johnson (1): 4ddf7c40 Add missing includes for better Bazel compatibility (ornladios#1255) Tarcísio Fischer (1): 54eb8193 Fix scoped enums comparison for equal/not equal cases (ornladios#1339) (ornladios#1571) Ted Drain (1): 0a0758ce Added write only property functions for issue ornladios#1142 (ornladios#1144) Thomas Hrabe (1): 534b756c Minor documentation clarification in numpy.rst (ornladios#1356) Thomas Peters (1): dffe869d quiet clang warning by adding default move ctor (ornladios#1821) Tom de Geus (1): a7ff616d Simplified example allowing more robust usage, fixed minor spelling issues Tomas Babej (1): 01fada76 Minor typo Toru Niina (1): 74d335a5 Replace a usage of C++14 language features with C++11 code (ornladios#1833) Trevor Laughlin (1): 63c2a972 Enable unique_ptr holder with mixed Deleters between base and derived types (ornladios#1353) Unknown (1): 0b3f44eb Trivial typos Vladimír Vondruš (2): 5b0ea77c Fix -Wmissing-prototypes warning on Clang. (ornladios#1863) 04c8f4b5 Expose BufferError among other pybind11 exceptions. (ornladios#1852) Wenzel Jakob (51): f94d7598 updated changelog for v2.2.1 release 6d19036c support docstrings in enum::value() (ornladios#1160) e7d304fb added citation reference (fixes ornladios#767) (ornladios#1189) 15e0e445 Moved section on licensing of contributions (fixes ornladios#1109) (ornladios#1188) ff6bd092 Fix pybind11 interoperability with Clang trunk (ornladios#1269) 2d0507db added v2.2.2 changelog 060936fe Detect pybind11 header path without depending on pip internals (fixes ornladios#1174) (ornladios#1190) ed670055 Minor fix for MSVC warning CS4459 (ornladios#1374) f5f66189 updated changelog for v2.2.3 cbd16a82 stl.h: propagate return value policies to type-specific casters (ornladios#1455) d4b37a28 added py::ellipsis() method for slicing of multidimensional NumPy arrays 885b5b90 Eigen test suite: don't create a np.matrix e0f3a766 Fixed flake8 error in test_iostream.py 44e39e0d fix regression reported by @cyfdecyf in ornladios#1454 (ornladios#1517) 35c82c72 changelog for version 2.2.4 & features targeted for 2.3.0 06710020 object_api: support the number protocol b4b22924 relax operator[] for tuples, lists, and sequences f4245181 enum_: move most functionality to a non-template implementation c8e9f3cc quench __setstate__ warnings (fixes ornladios#1522) c9b8933e flake8 fix 9f73060c std::array<> caster: support arbitrary sequences (ornladios#1602) adc2cdd5 fixed regression in STL type caster RVPs (fixes ornladios#1561) (ornladios#1603) e2eca4f8 Support C++17 aligned new statement (ornladios#1582) cea42467 fix py::cast<void *> (ornladios#1605) 64205140 added std::deque to overview.rst d1f64fa9 AppVeyor: quench pip deprecation warnings for v2.7 ccbe68b0 added binding delattr() -> PyObject_DelAttr analogous to hasattr() 25abf7ef flake8 fixes e11e71d8 Make compiler flags for -Werror specific to GNU, Clang, or Intel 51ca6b08 Update docs on std::out_of_range exception mapping (ornladios#1254) cf36e3d9 updated changelog 64f2a5f8 begin work on v2.3.1 ed39c504 README.md: added several folks who've made repeated contributions a1b71df1 fix issue ornladios#1804 (warning about redefined macros) 8b90b1da error_already_set: acquire GIL one line earlier (fixes ornladios#1779) 9fd47121 fix test suite (pytest changes in ExceptionInfo class) b2c4ff60 renamed local gil_scoped_acquire to gil_scoped_acquire_local to avoid ambiguity c9f5a464 pybind11 internals: separate different compilers 00a0aa99 v2.4.0 release e825205a begin working on v2.4.1 21d0eb46 Fix Python 3.8 test regression 5fd187eb minor changelog cleanup 31680e6f Implicit conversion from enum to int for Python 3.8 (fix by @sizmailov) e44fcc3c v2.4.1 release 82cf7935 begin working on next version f3109d84 future-proof Python version check from commit 31680e6 (@lgritz) 7f5dad7d Remove usage of C++14 constructs (fixes ornladios#1929) 7ec2ddfc v2.4.2 release 2abd7e1e updated release.rst to remove parts that are now automated 34c2281e begin working on next version 80d45248 v2.4.3 release Yannick Jadoul (4): b4719a60 Switching deprecated Thread Local Storage (TLS) usage in Python 3.7 to Thread Specific Storage (TSS) (ornladios#1454) 085a2943 Increasing timeout in test_gil_scoped.py to get AppVeyor to succeed 97784dad [BUGFIX] Fixing pybind11::error_already_set.matches to also work with exception subclasses (ornladios#1715) d23c821b Make static member functions, added with `def_static`, `staticmethod` descriptor instances (ornladios#1732) Zach DeVito (1): 03874e37 Fix leak in var arg handling ali-beep (1): 5ef13eb6 Add negative indexing support to stl_bind. (ornladios#1882) cdyson37 (1): 111b25b2 Mention flake8 and check-style.sh in CONTRIBUTING (ornladios#1567) kingofpayne (1): 12e8774b Added support for list insertion. (ornladios#1888) luz.paz (2): 28cb6764 misc. typos 13c08072 Typo luzpaz (2): 4b874616 Misc. typos (ornladios#1384) 21bf16f5 misc. comment typo (ornladios#1629) martinRenou (1): 35045eee Add getters for exception type, value and traceback (ornladios#1641) nstelzen (1): c2514340 Added note in documentation regarding make install (ornladios#1801) oremanj (2): fd9bc8f5 Add basic support for tag-based static polymorphism (ornladios#1326) e7761e33 Fix potential crash when calling an overloaded function (ornladios#1327) phil-zxx (1): c6b699d9 Added ability to convert from datetime.date to system_clock::time_point (ornladios#1848) sizmailov (1): 21c3911b add signed overload for `py::slice::compute` voxmea (1): 17983e74 Adds type_caster support for std::deque. (ornladios#1609)

pnorbert added enhancement discussion labels Apr 25, 2019

pnorbert self-assigned this Apr 25, 2019

williamfgc mentioned this issue Apr 26, 2019

no error when getting a variable that doesn't exist in current step #1387

Open

germasch mentioned this issue Apr 27, 2019

[WIP] separate user-facing Variable from internal Variable #1391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Change request: replace Mode:Read in Open with two new modes #1384

API Change request: replace Mode:Read in Open with two new modes #1384

pnorbert commented Apr 25, 2019

germasch commented Apr 25, 2019

germasch commented Apr 25, 2019

pnorbert commented Apr 25, 2019 via email

germasch commented Apr 26, 2019

pnorbert commented Apr 26, 2019 via email

germasch commented Apr 26, 2019

williamfgc commented Apr 26, 2019 •

edited

Loading

germasch commented Apr 26, 2019

williamfgc commented Apr 27, 2019 •

edited

Loading

germasch commented Apr 27, 2019

germasch commented Apr 28, 2019

API Change request: replace Mode:Read in Open with two new modes #1384

API Change request: replace Mode:Read in Open with two new modes #1384

Comments

pnorbert commented Apr 25, 2019

germasch commented Apr 25, 2019

germasch commented Apr 25, 2019

pnorbert commented Apr 25, 2019 via email

germasch commented Apr 26, 2019

pnorbert commented Apr 26, 2019 via email

germasch commented Apr 26, 2019

williamfgc commented Apr 26, 2019 • edited Loading

germasch commented Apr 26, 2019

williamfgc commented Apr 27, 2019 • edited Loading

germasch commented Apr 27, 2019

germasch commented Apr 28, 2019

williamfgc commented Apr 26, 2019 •

edited

Loading

williamfgc commented Apr 27, 2019 •

edited

Loading