-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
options for download organizations into files #115
Comments
Hi @mdenolle, what error are you receiving when trying to read the ASDF file produced by Rover using pyasdf? I just ran a very small data retrieval, and was able to read the i.e.
|
Hello, ROVER is designed to downloaded as station-day files. This resolution has been chosen because it has seemed optimize the speed of ROVER downloads, indexing, and file clutter. If we were to chunk the data into large miniseed files (EX: all requested time for one station per file) the returned file could become massive and completely unusable. Furthermore, ROVER's indexing component may not work because we would have to read a file with a highly variable size into memory. This does not seem like a robust design. We do try and provide tools to help with the database management and exploration. |
Hello and thank you both for your responses. To @timronan , we are a group that works with years of data and 500+ stations. For these large-N, large-T studies, having many small files is really impractical for I/O performance, and this is why we had turned to ASDF. Now i think that leaving this up to the user to define how big the file can be would be a great addition to ROVER. Note that most laptops can read 1G of data in memory, and that a 100Hz day in mseed 2.2Mb from the examples below. So our code will just reconcatenate them anyway as postprocessing. To @nick-iris , you are correct that the script works for me. I am trying to modify the rover.config, but having the command line will enable better scripting to get more files. However, I have had 2 inconsistencies in the download for ASDF. Sometimes it works: (obspy) user@ubuntu:~/TEST_ROVER/data$ time rover retrieve IU_ANMO_10_HHZ 2012-01-01 2012-02-01 --output-format=asdf --asdf-filename=crap.h5 Sometimes after (and after cleaning the data h5 file), in the same terminal, it does not: (obspy) user@ubuntu: real 0m3.360s The second problem I see in ROVER is that the download time for the same data between mseed and ASDF is 10-12 times faster. I suspect that the IRIS server packages better the mseed files since all of the traces (when the data is gappy) is downloaded in one single mseed file, vs I suspect that each trace (between each gap) is downloaded separately with ASDF. It would be more practical for the users to have the ASDF file created on the IRIS end, and then just one download. I am also not sure whether the ASDF file is closed and reopen for each trace download. Voila, happy to discuss where my bugs are. I am a fan of ROVER, i am just going to be a very heavy user of it and it seems okay to provide feedback. |
Hi @mdenolle,
This is one of the main motivations for adding the option of ASDF output option. We do not anticipate adding any capability to allow alternate file organizations to ROVER for the miniSEED output at this time; there is simply no generally "correct" answer for such organization and arbitrary organization makes ROVER more complex (it's already quite complex!). Instead, we will be offering ASDF (perhaps other HDF5 formats) and provide abstraction interfaces such as portable-fdsnws-dataselect and a direct read module in ObsPy (not yet released, but prepared here: obspy/obspy#2206). For many users this means they do not need to consider the individual miniSEED files themselves.
In this case ROVER worked as expected, you already downloaded that data so it does not need to download it again.
The issue here is that you did not remove the data index, just the data store. In the data directory you will see a
The ASDF output from ROVER is an extra processing step from the normal ASDF workflow of collecting miniSEED from the DMC. There is no difference in transmission (download) or extraction from the data center, between these modes. Also, for a number of reasons, creating the ASDF at the DMC is not a good option, mostly it does not scale well. What your results highlight is a performance issue converting the downloaded miniSEED to ASDF. This is an excellent target for us to investigate and while we do not control the HDF5 and ASDF libraries, hopefully we can improve this part of ROVER. We greatly appreciate this feedback and encourage you to continue to file tickets for issues with ROVER. |
Regarding the performance different when building ASDF from gappy data, I've posted the issue to the pyasdf project: |
Hi,
We are trying rover in the hope to collect data on the order pf 10sTBs. Here are a few things we (users) would like to see:
Thanks!
The text was updated successfully, but these errors were encountered: