Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance when adding gappy data to ASDF #57

Open
chad-earthscope opened this issue Oct 26, 2019 · 3 comments
Open

Slow performance when adding gappy data to ASDF #57

chad-earthscope opened this issue Oct 26, 2019 · 3 comments

Comments

@chad-earthscope
Copy link

Reading gappy data incurs a significant performance penalty compared to non-gappy data.

Attached is a read timing test and test data. The test script times the read if specified miniSEED using, for reference, obspy.read() followed by pyasdf's add_waveforms(). The test data are a day of both gappy (2200+ gaps) and non-gapped time series.

On my machine:

$ ./read-timing-test.py -o output.h5 clean-day.mseed gappy-day.mseed 
Opening output ASDF volume: output.h5
Processing clean-day.mseed
ObsPy read(): 0.05147713300000012 seconds
ASDF add_waveforms(): 0.1556968969999999 seconds
Processing gappy-day.mseed
ObsPy read(): 0.49582375 seconds
ASDF add_waveforms(): 7.62076154 seconds

The add_waveforms() method, at 7.6 seconds, is more than an order of magnitude slower than an obspy.read() of the same data at 0.49 seconds.

Obviously it would be nice if this were faster. As ASDF gains popularity it will be used with a likewise-broadening set of input data.

read-timing-test.zip

@wjlei1990
Copy link

Hi Chad,

I took a brief look into the data. For the "gappy-day.mseed". Even though it is overall 1-day length day, it contains 2250 singal traces. Thus, when writing into the hdf5 file, it will create 2250 h5py.Dataset in the file. I guess that might be the reason to cause the performance issue.

@krischer What do you think about it? I can do more performance log if needed. But I think it should mostly come from the hdf5(or h5py).

@krischer
Copy link
Member

I think this PR (if appended a bit) would be a good solution for this particular problem: #49

@wjlei1990
Copy link

Hi, thanks very much...Any time line to merge this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants