You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reading gappy data incurs a significant performance penalty compared to non-gappy data.
Attached is a read timing test and test data. The test script times the read if specified miniSEED using, for reference, obspy.read() followed by pyasdf's add_waveforms(). The test data are a day of both gappy (2200+ gaps) and non-gapped time series.
I took a brief look into the data. For the "gappy-day.mseed". Even though it is overall 1-day length day, it contains 2250 singal traces. Thus, when writing into the hdf5 file, it will create 2250 h5py.Dataset in the file. I guess that might be the reason to cause the performance issue.
@krischer What do you think about it? I can do more performance log if needed. But I think it should mostly come from the hdf5(or h5py).
Reading gappy data incurs a significant performance penalty compared to non-gappy data.
Attached is a read timing test and test data. The test script times the read if specified miniSEED using, for reference, obspy.read() followed by pyasdf's add_waveforms(). The test data are a day of both gappy (2200+ gaps) and non-gapped time series.
On my machine:
The add_waveforms() method, at 7.6 seconds, is more than an order of magnitude slower than an obspy.read() of the same data at 0.49 seconds.
Obviously it would be nice if this were faster. As ASDF gains popularity it will be used with a likewise-broadening set of input data.
read-timing-test.zip
The text was updated successfully, but these errors were encountered: