Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IterationEncoding: variableBased #250

Open
wants to merge 9 commits into
base: upcoming-2.0.0
Choose a base branch
from

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Mar 8, 2021

Please add a brief description (one sentence) here and link the issue this pull-request implements

Implements issue: #221 #236

Description

Add standard guidance for stepBased iteration encoding.

stepBased iteration encoding uses features of a storage, backend API or file format to encode time-varying data sets and attributes.

Affected Components

  • base
  • FORMAT: ADIOS

Logic Changes

Instead of storing iterations (snapshots) in individual groups, we rely on internal capabilities of a data format to store updates/revisions.

ADIOS:

Writer Changes

Reader Changes

Data Converter

Since this is a new iteration encoding that exists in parallel to existing iteration encodings, no conversion is necessary.

@ax3l ax3l added the major change non-backwards compatible change label Mar 8, 2021
@ax3l ax3l requested a review from franzpoeschel March 8, 2021 23:37
@ax3l ax3l changed the title [WIP] IterationEncoding: stepBased IterationEncoding: stepBased Mar 9, 2021
FORMAT_ADIOS.md Outdated Show resolved Hide resolved
FORMAT_ADIOS.md Outdated

In order to correlate openPMD iterations with ADIOS steps, the *root* group (path `/`) in ADIOS must contain a variable:

- `__step__`
Copy link
Member Author

@ax3l ax3l Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure we used __step__ and not __steps__

Copy link
Member Author

@ax3l ax3l Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @franzpoeschel let's call this snapshot without __ and with openPMD 2.0 "iteration" naming: #148

We read snapshot in frontend classes in openPMD-api and it might be useful for other use cases, e.g. data sets with only one snapshot (iteration).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in the current implementations this attribute is not part of the root group, but in /data, so it's /data/snapshot.
In variable-based encoding this makes the attribute look part of the iteration:

> bpls ../samples/variableBasedSeries.bp4 -alt                                                                    
Step 0:                                                                                                                      
  string    /basePath                                                            attr   = "/data/%T/"                        
  uint64_t  /data/changing_value                                                 attr   = 0                                  
  double    /data/dt                                                             attr   = 1                                  
  double    /data/meshes/E/0/position                                            attr   = 0                                  
  uint64_t  /data/meshes/E/0/shape                                               attr   = 1                                  
  double    /data/meshes/E/0/unitSI                                              attr   = 1                                  
  uint64_t  /data/meshes/E/0/value                                               attr   = 0                                  
  uint64_t  /data/meshes/E/attr_0                                                attr   = 0                                  
  string    /data/meshes/E/axisLabels                                            attr   = {"x"}                              
  string    /data/meshes/E/dataOrder                                             attr   = "C"                                
  string    /data/meshes/E/geometry                                              attr   = "cartesian"                        
  double    /data/meshes/E/gridGlobalOffset                                      attr   = 0                                  
  double    /data/meshes/E/gridSpacing                                           attr   = 1                                  
  double    /data/meshes/E/gridUnitSI                                            attr   = 1                                  
  float     /data/meshes/E/timeOffset                                            attr   = 0                                  
  double    /data/meshes/E/unitDimension                                         attr   = {0, 0, 0, 0, 0, 0, 0}              
  int32_t   /data/meshes/E/x                                                     {1000}                                      
  double    /data/meshes/E/x/position                                            attr   = 0
  double    /data/meshes/E/x/unitSI                                              attr   = 1
  int32_t   /data/meshes/E/y                                                     {1}
  double    /data/meshes/E/y/position                                            attr   = 0
  double    /data/meshes/E/y/unitSI                                              attr   = 1
  string    /data/meshes/changing_constant/axisLabels                            attr   = {"x"}
  uint64_t  /data/meshes/changing_constant/changing_constant/shape               attr   = 0
  uint64_t  /data/meshes/changing_constant/changing_constant/value               attr   = 0
  string    /data/meshes/changing_constant/dataOrder                             attr   = "C"
  string    /data/meshes/changing_constant/geometry                              attr   = "cartesian"
  double    /data/meshes/changing_constant/gridGlobalOffset                      attr   = 0
  double    /data/meshes/changing_constant/gridSpacing                           attr   = 1
  double    /data/meshes/changing_constant/gridUnitSI                            attr   = 1
  double    /data/meshes/changing_constant/position                              attr   = 0
  uint64_t  /data/meshes/changing_constant/shape                                 attr   = 0
  float     /data/meshes/changing_constant/timeOffset                            attr   = 0
  double    /data/meshes/changing_constant/unitDimension                         attr   = {0, 0, 0, 0, 0, 0, 0}
  double    /data/meshes/changing_constant/unitSI                                attr   = 1
  uint64_t  /data/meshes/changing_constant/value                                 attr   = 0
  uint64_t  /data/particles/changing_constant/position/position/shape            attr   = 0
  uint64_t  /data/particles/changing_constant/position/position/value            attr   = 0
  uint64_t  /data/particles/changing_constant/position/shape                     attr   = 0
  float     /data/particles/changing_constant/position/timeOffset                attr   = 0
  double    /data/particles/changing_constant/position/unitDimension             attr   = {1, 0, 0, 0, 0, 0, 0}
  double    /data/particles/changing_constant/position/unitSI                    attr   = 1
  uint64_t  /data/particles/changing_constant/position/value                     attr   = 0
  uint64_t  /data/snapshot                                                       attr   = 0
  double    /data/time                                                           attr   = 0
  double    /data/timeUnitSI                                                     attr   = 1
  string    /date                                                                attr   = "2022-08-19 08:35:01 +0000"
  string    /iterationEncoding                                                   attr   = "variableBased"
  string    /iterationFormat                                                     attr   = "/data"
  string    /meshesPath                                                          attr   = "meshes/"
  string    /openPMD                                                             attr   = "1.1.0"
  uint32_t  /openPMDextension                                                    attr   = 0
  string    /particlesPath                                                       attr   = "particles/"
  string    /software                                                            attr   = "openPMD-api"
  string    /softwareVersion                                                     attr   = "0.15.0-dev"

In group-based encoding however, the snapshot attribute is then at the level above the single iterations:

> bpls ../samples/bp4steps_yes_yes.bp/ -alt                                                                       
Step 0:                                                                                                                      
  string    /basePath                                  attr   = "/data/%T/"                                                  
  double    /data/0/dt                                 attr   = 1                                                            
  string    /data/0/meshes/E/axisLabels                attr   = {"x"}                                                        
  string    /data/0/meshes/E/dataOrder                 attr   = "C"                                                          
  string    /data/0/meshes/E/geometry                  attr   = "cartesian"                                                  
  double    /data/0/meshes/E/gridGlobalOffset          attr   = 0                                                            
  double    /data/0/meshes/E/gridSpacing               attr   = 1                                                            
  double    /data/0/meshes/E/gridUnitSI                attr   = 1                                                            
  float     /data/0/meshes/E/timeOffset                attr   = 0                                                            
  double    /data/0/meshes/E/unitDimension             attr   = {0, 0, 0, 0, 0, 0, 0}                                        
  string    /data/0/meshes/E/vector_of_string          attr   = {"vector", "of", "string"}                                   
  int32_t   /data/0/meshes/E/x                         {10}                                                                  
  double    /data/0/meshes/E/x/position                attr   = 0                                                            
  double    /data/0/meshes/E/x/unitSI                  attr   = 1                                                            
  double    /data/0/time                               attr   = 0                                                            
  double    /data/0/timeUnitSI                         attr   = 1                                                            
  uint64_t  /data/snapshot                             attr   = 0                                                            
  string    /date                                      attr   = "2022-08-19 08:38:04 +0000"                                  
  string    /iterationEncoding                         attr   = "groupBased"                                                 
  string    /iterationFormat                           attr   = "/data/%T/"                                                  
  string    /meshesPath                                attr   = "meshes/"
  string    /openPMD                                   attr   = "1.1.0"
  uint32_t  /openPMDextension                          attr   = 0
  string    /software                                  attr   = "openPMD-api"
  string    /softwareVersion                           attr   = "0.15.0-dev"

@ax3l ax3l force-pushed the topic-stepBasedEncoding branch 3 times, most recently from 299e892 to 7823519 Compare April 7, 2021 21:59

In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `stepBased` is chosen for `iterationEncoding`:

- `snapshot`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: we could allow to skip this if only one iteration (snapshot) is written.
In that case, the implied value should be 0 and there must be exactly one update/step in the data format.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the other backends actually do when that iteration encoding is chosen, see the variableBasedSingleIteration test. The snapshot attribute is not written.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current state of openPMD/openPMD-api#949, the snapshot attribute is always written, but not required at read-time (then assumed to be 0). I should add a test somehow to ensure that reading without snapshot works as intended.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now tested

@ax3l ax3l changed the title IterationEncoding: stepBased IterationEncoding: variableBased Apr 24, 2021
@ax3l ax3l changed the title IterationEncoding: variableBased IterationEncoding: stepBased Apr 24, 2021
FORMAT_ADIOS.md Outdated
@@ -32,3 +32,28 @@ Output from `bpls -A` for a boolean attribute `pybool` stored in the location of

There is no convention yet for a unique representation of ADIOS2 variables with boolean type.
Thus, implementations should cast the data to and from `unsigned char` instead.

## `stepBased` Encoding of Iterations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename: I am not sure why, but for some reason we now call this variableBased in openPMD/openPMD-api#855

@franzpoeschel let's clarify what we pick, shall I update the standard PR to be named variableBased, too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we discussed this a while ago and came to the conclusion to call it variable-based since steps are an ADIOS2-specific feature, but this encoding generally relies on a backend's ability to have variable datasets.

@ax3l ax3l changed the title IterationEncoding: stepBased IterationEncoding: variableBased May 20, 2021
FORMAT_ADIOS.md Outdated

## Attributes

openPMD **attributes** stored as ADIOS `Variables` at the location where they would usually be stored.
Copy link
Member Author

@ax3l ax3l Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franzpoeschel raised this is currently implemented that way and I think since we don't need to change the type of attributes over time, we can keep it so:

Suggested change
openPMD **attributes** stored as ADIOS `Variables` at the location where they would usually be stored.
openPMD **attributes** stored as ADIOS `Variables` at the location where they would usually be stored.
The `__is_boolean__/...` qualifiers are still stored as ADIOS `Attribute`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openPMD attributes stored as ADIOS Variables

Is this outdated? We will remove the new ADIOS2 schema where this happens

FORMAT_ADIOS.md Outdated

## `stepBased` Encoding of Iterations

The `iterationEncoding` mode `stepBased` must be implemented via ADIOS steps.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `iterationEncoding` mode `stepBased` must be implemented via ADIOS steps.
The `iterationEncoding` mode `variableBased` must be implemented via a backend's feature to describe *variable* datasets and attributes.
This means that such datasets and attributes are present in different versions with different contents.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed here that this file was ADIOS-specific.

FORMAT_ADIOS.md Outdated

## Datasets

An openPMD **data set** is represented by an group prefix that contains an ADIOS variable `__data__`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes the ADIOS2 schema that we will abolish

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we will stick with a schema that is similar to the current schema (datasets and attributes are distinguished by using different backend features (variables and attributes)), but with two additions:

  • Attributes can be variable now too
  • We will use some protocol for identifying if a group is active in the current step

Since that updated schema is not yet implemented, I'd suggest we don't describe this just yet. I would not like to standardize something that in the end turns out to not work well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started exploring that new schema here.


An openPMD **data set** is represented by an group prefix that contains an ADIOS variable `__data__`.

**attributes** are defined further below and can also appear at the dataset's **group** prefix level.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "further below" mean in this context?

double /data/meshes/E/x/__data__ 10*{1000}
double /data/meshes/E/x/position 10*{1}
double /data/meshes/E/x/unitSI 10*scalar
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated. Since the old ADIOS2 schema does not yet support variable-based iteration encoding, we currently only have this kind of experimental implementation.

STANDARD.md Outdated
@@ -212,6 +216,7 @@ Each file's *root* group (path `/`) must further define the attributes:
- allowed values:
- `fileBased` (multiple files)
- `groupBased` (one file)
- `stepBased` (one file with internal encoding for iterations, if supported by the data format)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `stepBased` (one file with internal encoding for iterations, if supported by the data format)
- `variableBased` (one file with internal encoding for iterations, if supported by the data format)

STANDARD.md Outdated
- allowed values:
- see *Iterations and Time Series* below
- for `fileBased` and `groupBased`, this is fixed to `/data/%T/`
- for `stepBased` this is fixed to `/data/`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- for `stepBased` this is fixed to `/data/`
- for `variableBased` this is fixed to `/data/`

STANDARD.md Outdated
- data-format internal convention
- *slowest varying index* of data

### `stepBased` Encoding of Iterations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### `stepBased` Encoding of Iterations
### `variableBased` Encoding of Iterations

STANDARD.md Outdated

### `stepBased` Encoding of Iterations

In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `stepBased` is chosen for `iterationEncoding`:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `stepBased` is chosen for `iterationEncoding`:
In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `variableBased` is chosen for `iterationEncoding`:

In order to correlate openPMD iterations with an index of data-format internal updates/steps or an index in the slowest varying dimension of an array, the *root* group (path `/`) must contain an additional variable once `stepBased` is chosen for `iterationEncoding`:

- `snapshot`
- type: 1-dimensional array containing N *(int)* elements, where N is the number of updates/steps in the data format

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently not implemented as an array, but as a scalar variable that changes across steps.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation actually accepts arrays at read-time, but I should test that it works. At write time, the API currently only produces scalars.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now tested

- description: for each update/step in a data format, this variable needs to be updated with the corresponding openPMD iteration.
- note: in some data formats, updates/steps are absolute and not every update/step contains an update for each declared openPMD record
- advice to implementers: an openPMD iteration might be spread over multiple updates/steps, but not vice versa.
In such a scenario, an individual openPMD record's update/step must appear exactly once per iteration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: A similar situation can occur when using Append mode: An iteration is then present multiple times with redundant definitions. This will either be solved by truncation or by reading only the first/last instance of that iteration.

STANDARD.md Outdated
files (`fileBased`) or series of groups (`groupBased`) should have
attributes that describe the current time and the last
time step.
In addition to holding information about the iteration, each series of files (`fileBased`), series of groups (`groupBased`) or internally encoded iterations (`stepBased`) should have attributes that describe the current time and the last time step.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In addition to holding information about the iteration, each series of files (`fileBased`), series of groups (`groupBased`) or internally encoded iterations (`stepBased`) should have attributes that describe the current time and the last time step.
In addition to holding information about the iteration, each series of files (`fileBased`), series of groups (`groupBased`) or internally encoded iterations (`variableBased`) should have attributes that describe the current time and the last time step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major change non-backwards compatible change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants