-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigating Performance Regression in ADIOS BP for #400
Comments
BP Data:
HDF5 Data:
|
Benchmark: reads & processes field data from openpmd_viewer.addons import LpaDiagnostics
ts = LpaDiagnostics('diag1/')
a0 = ts.iterate(ts.get_a0, pol='y') # ~1 or 50 iterations processed per second (BP or H5) Warnings on open for BP series:
|
The access pattern is quite simple: 2-4 reads on RZ fields (full read) per step, which take 95-97% of the time of this |
Definitely on the openPMD-api side of things. CC @guj and @franzpoeschel for us to dig deeper: from openpmd_viewer.addons import LpaDiagnostics
from openpmd_viewer.openpmd_timeseries.data_reader import io_reader %load_ext line_profiler BP4ts_bp = LpaDiagnostics('diag1/')
%lprun -f io_reader.read_field_circ -f io_reader.field_reader.get_data \
a0_bp = ts_bp.iterate(ts_bp.get_a0, pol='y') # ~50 per second
HDF5ts_h5 = LpaDiagnostics('lab_diags/hdf5/') %lprun -f io_reader.read_field_circ -f io_reader.field_reader.get_data \
a0_h5 = ts_h5.iterate(ts_h5.get_a0, pol='y') # ~50 per second
|
BP4 Directly w/ openPMD-apis = io.Series("diag1/openpmd_%T.bp", io.Access.read_only) #io.list_series(s, True) %%time
for k_i, i in s.iterations.items():
print("Iteration: {0}".format(k_i))
E = i.meshes["E"]
E_r = E["r"]
E_t = E["t"]
E_z = E["z"]
# emulate multiple reads
for i in range(4):
E_r_data = E_r[()]
s.flush()
E_t_data = E_t[()]
s.flush()
E_z_data = E_z[()]
s.flush()
|
HDF5 Directly w/ openPMD-apis_h5 = io.Series("lab_diags/hdf5/data%T.h5", io.Access.read_only) %%time
for k_i, i in s_h5.iterations.items():
print("Iteration: {0}".format(k_i))
E = i.meshes["E"]
E_r = E["r"]
E_t = E["t"]
E_z = E["z"]
# emulate multiple reads
for i in range(4):
E_r_data = E_r[()]
s_h5.flush()
E_t_data = E_t[()]
s_h5.flush()
E_z_data = E_z[()]
s_h5.flush()
|
Next: reorganize both HDF5 and ADIOS BP4 with |
Thanks Axel. Is the file available somewhere? |
@guj I can share the files on NERSC with you 👍 |
Please consider also what I wrote here My current suspicion is that the data was written from an application that uses load balancing. The increasing load times in ADIOS2 would then just "represent" the increased complexity in the data (would be confirmed by showing the output of |
Thanks @franzpoeschel ! I posted the output of the decomposition a bit further up in a comment: |
Ah, I didn't see that. Still need to go over your results in detail. |
Is this really the HDF5 benchmark where you are measuring this? HDF5 uses a simple fallback implementation for this, while this same function in ADIO2 can have a serious performance impact. |
Yes, above I measure the HDF5 files and the ADIOS2 files, sorted under the respective captions. Indeed, very large time spend in here for HDF5... Longer than flush even o.0 |
Good point, added a full overview here now. Looking at the data that we benchmark here (RZ fields), the variable |
I did some first test runs on Perlmutter. (Using for now
I'll prepare a PR that fixes this performance bug, but this will only be useful for the openPMD-viewer if it actually uses How to do a benchmark of compiled (non-Python) software on PerlmutterI used google-perftools for finding this performance bug. It might be helpful to see your results. InstallationFortunately relatively simple since Perlmutter has all dependencies loaded without needing any additional modules loaded. Gperftools consists of two components.
Profiling an applicationThe Python script whose C++ portion I wanted to profile was:
In order to profile this:
This runs the application with a slight slowdown. Afterwards, a file named
Find below the example graph which shows some Python internals, but more importantly the openPMD-api and ADIOS2-internal calls. |
Performance analysis by @guj:
Action items:
|
@RemiLehe reported and shared a HDF5/ADIOS2 data set with me that is 30x slower for BP than H5 in
t.iterate(ts.get_a0, pol='y')
. Also, it gets slower the more often it is called.bpls -D ...
: for decomposed data on diskopenPMD-pipe
could be used to reorganize the data to be contiguous (in fact, its default)defer_iteration_parsing
https://openpmd-api.readthedocs.io/en/0.15.2/details/backendconfig.html when the user specifies check_all_files=FalseX-ref #380
The text was updated successfully, but these errors were encountered: