Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirement: Identification of non-raw, derived data #10

Open
krischer opened this issue Jan 3, 2018 · 17 comments
Open

Requirement: Identification of non-raw, derived data #10

krischer opened this issue Jan 3, 2018 · 17 comments

Comments

@krischer
Copy link
Contributor

krischer commented Jan 3, 2018

Allow for identification of non-raw, derived data (e.g. processed data, quality parameters, metadata versioning, synthetic data).

@chad-earthscope
Copy link
Member

We support this and are increasingly seeing the need to clearly identify synthetic, processed and derived data.

This requirement seems like a sub-bullet to #4, which is a relatively large sub-topic in it's own right.

@krischer
Copy link
Contributor Author

krischer commented Jan 8, 2018

Any ideas how this could look like? A free-form ASCII string after the identifier proposed in #4?

@crotwell
Copy link

crotwell commented Jan 8, 2018

Especially for the derived data, we should be able to identify the channels that the new timeseries came from. For example some are recording latency at a receiving node of a input channel as a new timeseries. Another case, where there would be more than one derived from channel, would be deriving a North channel from a borehole instrument with non-traditional orientations.

A standard "derived from" key could be done as part of the optional/additional headers. This does somewhat mix metadata into timeseries data, but for items as simple as latency or rotations it might be acceptable, and as far as I know StationXML does not have the ability to specify this type of derivation.

I would argue that unless the processing or derivation is trivial or close to it, that it is better not to mix the determination of the codes of a new channel, an identification problem, with linking to the source channels, a metadata problem. This is especially true if the fundamental nature of the data changes, ie latency of a ground motion channel.

@jmsaurel
Copy link

It looks a little like the data quality flag of miniSEED2.4 (R, D, Q or M) but with extended capabilities, isn't it ?

I'm in favor of something that allows clearly to identify synthetic channels, or derived channels (ie, samples whose values from the digitizer have been modified). Maybe an extended version of the data quality flag.

I'm not in favor of placing in the data informations about where do this new data comes from. This should be kept in the metadata.

Regarding the indication of quality verifications on the data that don't affect at all the values of the samples (ie, only qualifying, or removing bad data), it could be taken by the versioning #13

@krischer
Copy link
Contributor Author

A simplistic possibility would be to somehow enhance the quality codes and add two new codes for synthetic and derived data (are there other broad categories?) and then delegate further details to the arbitrary headers of #14 as proposed by @crotwell.

@andres-h
Copy link

Would BHZ be a "derived channel", since it is derived from HHZ?

@jmsaurel
Copy link

If BHZ comes directly out of the digitizer, I wouldn't call it a "derived channel", because you don't know how it's made inside. It could be derived from the HHZ, but it could come from a different filter stream.

But if BHZ is made by the acquisition software (such as SC3, for example), then it could be called "derived channel" because it's no more data than comes out straight of the digitizer box.

@tim-iris
Copy link

Isn't this really an issue where we are implying that we must capture provenance. If so, and I think it is, then I do not think this really belongs in the time series exchange format. Provenance is a much bigger issue and could unnecessarily complicit things. Any expansion of the Quality code should be though through very carefully.... I have concerns with this.

@krischer
Copy link
Contributor Author

Summary

(Please let me know if I missed a point or misunderstood something)

This is a bit of a complicated issue. I think we agree that full and proper provenance is not in the scope of the next generation data format and must be delegated to the meta data in some form. Also where exactly this information should go in the format is not clear and there are a large number of possibilities. Thus please vote on the following issue:

Should there be a simple way to flag time series in the new format as either "raw" (whatever the exact definition of that is), "derived" (not "raw"), or "synthetic" (not based on actual recordings)? (Yes/No)

@crotwell
Copy link

Yes

@chad-earthscope
Copy link
Member

Yes.

@kaestli
Copy link

kaestli commented Jan 30, 2018

No (not as a flag, as terms are not defined and overlapping.)
But such streams should have different streamIDs and different Metadata

@ozym
Copy link

ozym commented Jan 30, 2018

Yes

@claudiodsf
Copy link

Yes, but not a single flag, since the three definitions can overlap.

@ihenson-bsl
Copy link

Yes

@ValleeMartin
Copy link

Yes but taking into account that definitions can overlap

@JoseAntonioJara
Copy link

No, I think this feature should be specified together with the rest of channel's metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests