Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite H5MD module #4480

Merged
merged 9 commits into from
Apr 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 85 additions & 43 deletions doc/sphinx/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,14 +132,14 @@ Writing H5MD-files
either ``libhdf5-openmpi-dev`` for OpenMPI or ``libhdf5-mpich-dev`` for
MPICH, but not ``libhdf5-dev`` which is the serial version.

For large amounts of data it's a good idea to store it in the hdf5 (H5MD
is based on hdf5) file format (see https://www.hdfgroup.org/ for
details). Currently |es| supports some basic functions for writing simulation
For long simulations, it's a good idea to store data in the hdf5 file format
(see https://www.hdfgroup.org for details, H5MD is based on hdf5).
Currently |es| supports some basic functions for writing simulation
data to H5MD files. The implementation is MPI-parallelized and is capable
of dealing with varying numbers of particles.
of dealing with a varying number of particles.

To write data in a hdf5-file according to the H5MD proposal (https://nongnu.org/h5md/),
first an object of the class
To write data in a hdf5-file according to the H5MD proposal
(https://nongnu.org/h5md), first an object of the class
:class:`espressomd.io.writer.h5md.H5md` has to be created and linked to the
respective hdf5-file. This may, for example, look like:

Expand All @@ -152,56 +152,98 @@ respective hdf5-file. This may, for example, look like:

An optional argument to the constructor of :class:`espressomd.io.writer.h5md.H5md` is
an instance of :class:`espressomd.io.writer.h5md.UnitSystem` which encapsulates
physical units for time, mass, length and electrical charge.
physical units for time, mass, length and electrical charge.

If a file at the given filepath exists and has a valid H5MD structure,
it will be backed up to a file with suffix ".bak". This backup file will be
deleted when the new file is closed at the end of the simulation with
``h5.close()``.

The current implementation always writes the following properties: positions,
velocities, forces, species (|es| types), charges, and masses of the particles.

In simulations with varying numbers of particles (MC or reactions), the
it will be backed up to a file with suffix ".bak" and loaded into
a new file. Therefore H5MD can be used together with checkpointing.
The backup file will be deleted when the new file is closed at the end of the
simulation with :meth:`~espressomd.io.writer.h5md.H5md.close()`. The backup
file is not be erased if the simulation terminates unexpectedly.

To write data to the HDF5 file, simply call the method
:meth:`~espressomd.io.writer.h5md.H5md.write` without any arguments.
After the last write, call :meth:`~espressomd.io.writer.h5md.H5md.flush()`
and then :meth:`~espressomd.io.writer.h5md.H5md.close()`
to close the datasets and remove the backup file.

The current implementation writes the following properties by default: folded
positions, periodic image count, velocities, forces, species (|es| types),
charges and masses of the particles. While folded positions are written
to disk, the unfolded coordinates can be reconstructed from the image count.
The time-dependent box size and Lees-Edwards parameters are also stored.
Some of these properties can be opted out by specifying in argument
``fields`` the subset of fields to write to the trajectory file;
call method :meth:`~espressomd.io.writer.h5md.H5md.valid_fields()`
to find out which string corresponds to which field.

In simulations with a varying number of particles (Monte-Carlo reactions), the
size of the dataset will be adapted if the maximum number of particles
increases but will not be decreased. Instead a negative fill value will
be written to the trajectory for the id. If you have a parallel
be written to the trajectory for the id.

If you have a parallel
simulation, please keep in mind that the sequence of particles in general
changes from timestep to timestep. Therefore you have to always use the
dataset for the ids to track which position/velocity/force/type/mass
entry belongs to which particle. To write data to the HDF5 file, simply
call the method :meth:`~espressomd.io.writer.h5md.H5md.write` without any arguments.

After the last write, you have to call
:meth:`~espressomd.io.writer.h5md.H5md.close` to remove
the backup file, close the datasets, etc.

H5MD files can be read and modified with the python module h5py (for
documentation see `h5py <https://docs.h5py.org/en/stable/>`_). For example,
all positions stored in the file called "sample.h5" can be read using:

.. code:: python
entry belongs to which particle.

import h5py
h5file = h5py.File("sample.h5", 'r')
positions = h5file['particles/atoms/position/value']

If the data was stored with `physical units
<https://nongnu.org/h5md/modules/units.html>`_, they can be accessed with:
For an example involving physical units, see :file:`/samples/h5md.py`.

.. code:: python
.. _Reading H5MD-files:

positions.attrs['unit']
forces = h5file['particles/atoms/force/value']
forces_unit = forces.attrs['unit']
sim_time = h5file['particles/atoms/id/time']
print('last frame: {:.3f} {}'.format(sim_time.value[-1], sim_time.attrs['unit'].decode('utf8')))
Reading H5MD-files
------------------

Furthermore, the files can be inspected with the GUI tool hdfview or visually with the
H5MD VMD plugin (see `H5MD plugin <https://github.com/h5md/VMD-h5mdplugin>`_).
H5MD files can be read and sometimes modified by many tools. If the data was
stored with `physical units <https://nongnu.org/h5md/modules/units.html>`__,
they can be accessed by reading the group attributes. Since the data is
written in parallel, the particles are unsorted; if particles were created
with increasing particle id and no particle deletion occurred during the
simulation, the coordinates can be sorted with a simply numpy operation.

For an example involving physical units, see :file:`/samples/h5md.py`.
To read with the python module ``h5py`` (documentation:
`HDF5 for Python <https://docs.h5py.org/en/stable>`__)::

import h5py
with h5py.File("sample.h5", mode='r') as h5file:
positions = h5file['particles/atoms/position/value']
positions.attrs['unit']
forces = h5file['particles/atoms/force/value']
forces_unit = forces.attrs['unit']
sim_time = h5file['particles/atoms/id/time']
print(f"last frame: {sim_time[-2]:.3f} {sim_time.attrs['unit'].decode('utf8')}")

To read with the python module ``pandas`` (documentation: `HDFStore: PyTables
<https://pandas.pydata.org/docs/reference/io.html#hdfstore-pytables-hdf5>`_)::

import pandas
with pandas.HDFStore("sample.h5", mode='r') as h5file:
positions = h5file.root.particles.atoms.position.value
positions.attrs['unit']
forces = h5file.root.particles.atoms.force.value
forces_unit = forces.attrs['unit']
sim_time = h5file.root.particles.atoms.id.time
print(f"last frame: {sim_time[-2]:.3f} {sim_time.attrs['unit'].decode('utf8')}")

To read from the command line with
`h5dump <https://support.hdfgroup.org/HDF5/doc/RM/Tools/h5dump.htm>`__
(Ubuntu package ``hdf5-tools``):

.. code:: sh

# show metadata only
h5dump --header sample.h5 | less
# show metadata + data
h5dump sample.h5 | less

H5MD files can also be inspected with the GUI tool
`HDFView <https://www.hdfgroup.org/downloads/hdfview>`__ (Ubuntu package
``hdfview``) or visually with the H5MD VMD plugin (GitHub project
`h5md/VMD-h5mdplugin <https://github.com/h5md/VMD-h5mdplugin>`__).

For an example involving ``h5py``, coordinates resorting and reconstruction
of the unfolded coordinates, see :file:`/samples/h5md_trajectory.py`.

.. _Writing MPI-IO binary files:

Expand Down
103 changes: 103 additions & 0 deletions samples/h5md_trajectory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#
# Copyright (C) 2022 The ESPResSo project
#
# This file is part of ESPResSo.
#
# ESPResSo is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# ESPResSo is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#

"""
Write ESPResSo trajectories in the H5MD format with a Lees-Edwards offset,
aperiodic boundaries and a fluctutating box size. Read the trajectory and
reconstruct the unfolded positions. See :ref:`Writing H5MD-files` for details.
"""

import espressomd
import espressomd.io.writer
import espressomd.lees_edwards

import os
import h5py
import numpy as np
import tempfile

# set particles outside the main box to get folded coordinates
system = espressomd.System(box_l=[6, 7, 8], periodicity=[True, True, False])
system.time_step = 0.1
system.cell_system.skin = 0.0
system.lees_edwards.shear_direction = 0
system.lees_edwards.shear_plane_normal = 1
for i in range(12):
p = system.part.add(pos=3 * [float(i - 4)], v=[0., 1., 0.])

# record particle positions in a H5MD file and in python lists
temp_directory = tempfile.TemporaryDirectory()
temp_file = os.path.join(temp_directory.name, 'test.h5')
h5 = espressomd.io.writer.h5md.H5md(file_path=temp_file)
xyz_folded = []
xyz_unfolded = []
# write coordinates with aperiodic boundary conditions
h5.write()
xyz_folded.append(system.part.all().pos_folded[:])
xyz_unfolded.append(system.part.all().pos[:])
# add Lees-Edwards offset
system.lees_edwards.protocol = espressomd.lees_edwards.LinearShear(
shear_velocity=1., initial_pos_offset=0., time_0=0.)
system.integrator.run(20)
h5.write()
xyz_folded.append(system.part.all().pos_folded[:])
xyz_unfolded.append(system.part.all().pos[:])
# remove Lees-Edwards offset
system.lees_edwards.protocol = espressomd.lees_edwards.Off()
system.integrator.run(10)
h5.write()
xyz_folded.append(system.part.all().pos_folded[:])
xyz_unfolded.append(system.part.all().pos[:])
# resize box (simulates NpT)
system.box_l = system.box_l + 1.
system.integrator.run(10)
h5.write()
xyz_folded.append(system.part.all().pos_folded[:])
xyz_unfolded.append(system.part.all().pos[:])
h5.flush()
h5.close()
xyz_folded = np.array(xyz_folded)
xyz_unfolded = np.array(xyz_unfolded)

# read H5MD data
with h5py.File(temp_file, mode='r') as h5file:
p_id = h5file['particles/atoms/id/value'][:]
argsort = (np.tile(range(p_id.shape[0]), (p_id.shape[1], 1)).T,
np.unravel_index(np.argsort(p_id, axis=-1), p_id.shape)[1])
h5_pos = h5file['particles/atoms/position/value'][:][argsort]
h5_img = h5file['particles/atoms/image/value'][:][argsort]
h5_box = h5file['particles/atoms/box/edges/value'][:]

# apply unfolding
pos_folded = h5_pos
pos_unfolded = h5_img * h5_box[:, np.newaxis, :] + h5_pos

# report results from last frame
with np.printoptions(precision=1, suppress=True):
print('unfolded coordinates:')
print(' from ESPResSo')
print(xyz_unfolded[-1])
print(' from H5MD')
print(pos_unfolded[-1])
print('')
print('folded coordinates:')
print(' from ESPResSo')
print(xyz_folded[-1])
print(' from H5MD')
print(pos_folded[-1])
6 changes: 3 additions & 3 deletions src/core/BoxGeometry.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,8 @@ class BoxGeometry {
Utils::Vector<T, 3> get_mi_vector(const Utils::Vector<T, 3> &a,
const Utils::Vector<T, 3> &b) const {
if (type() == BoxType::LEES_EDWARDS) {
return clees_edwards_bc().distance(a - b, length(), length_half(),
length_inv(), m_periodic);
return lees_edwards_bc().distance(a - b, length(), length_half(),
length_inv(), m_periodic);
}
assert(type() == BoxType::CUBOID);
return {get_mi_coord(a[0], b[0], 0), get_mi_coord(a[1], b[1], 1),
Expand All @@ -199,7 +199,7 @@ class BoxGeometry {
void set_type(BoxType type) { m_type = type; }

LeesEdwardsBC &lees_edwards_bc() { return m_lees_edwards_bc; }
LeesEdwardsBC const &clees_edwards_bc() const { return m_lees_edwards_bc; }
LeesEdwardsBC const &lees_edwards_bc() const { return m_lees_edwards_bc; }
void set_lees_edwards_bc(LeesEdwardsBC bc) { m_lees_edwards_bc = bc; }

/** Calculate the velocity difference including the Lees-Edwards velocity */
Expand Down
2 changes: 1 addition & 1 deletion src/core/CellStructure.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ void CellStructure::resort_particles(int global_flag, BoxGeometry const &box) {
}

m_rebuild_verlet_list = true;
m_le_pos_offset_at_last_resort = box.clees_edwards_bc().pos_offset;
m_le_pos_offset_at_last_resort = box.lees_edwards_bc().pos_offset;

#ifdef ADDITIONAL_CHECKS
check_particle_index();
Expand Down
Loading