-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Version Field #12
Comments
I feel like this would be a bit dangerous. I think only data centers should set this field and we should recommend that all processing software sets this field to 0 upon writing. |
Question, does this change after quality control procedures if nothing about the data was changed? In other words, is this a version of the data that increments on change, or a state of processing that indicates whether it was retrieved before or after qc? I am ok with either but maybe it should be explicit what causes the value to increment. I also now think that any user modification versioning should go into the extra stuff. Would be good to have a standard key for it, but not part of fixed header. |
This is what was in the 20170622 draft:
I would think it increments on change or, when incremented, implies that something might have changed. The motivation for IRIS is that we have been using the So I would like a "version" with the primary goal of identifying a later copy of the data, and try to mostly steer aware from semantic meaning beyond a very few classes: 0=converted, 1=raw, and >1=later. The meaning of any of the versions for a particular time series is probably best kept outside of the miniSEED. The
Agreed, except we should be able to robustly separate the case of 0=converted, 1=raw. So sac2mseed should write version 0 so it can be identified separately from Nanometrics equipment writing the raw data with version 1. In the DMC's case we would let the operator/owner set the version and we would only change it when the operator is no longer available or in coordination with the operator when changing the data for some reason. |
Small thing, but maybe the "unknown" value should 255 and the original data logger raw should be 0. I can envision confusing as it looks like data might transition from 0 to 1, but 255 is obviously different. |
Problem with "unknown" = 255 is that it's bigger than any other version, so the straightforward test for identifying "later" would always need a special case check. I don't think most people will see this, so it won't have a chance to be obvious, it'll be a program doing a check instead. There will probably be some muddling between versions 0 and 1 by data generators that did not adhere to the recommendations and that's OK. They key principle is that a larger number the later and more preferred the data in any exchange scenario remains in effect. |
But don't isn't unknown a special case always? Shouldn't "user modified" be > "raw"? Not sure I understand, if I download the raw data, apply the response and save it, we said I was supposed to set the version the "unknown" value, but the data is later than the original data? Maybe we should have 2 special cases, 0 for unknown as converted on input for use by datacenters, and 255 for modified by the end user post-datacenter, meaning the QC trail no longer applies? |
Sounds good to me. We should just clearly specify that this value is for data-center operational use only and the semantics can differ per data-center. Also the spec should specify that all data converters and processing software should set this value to zero. |
For me it's a data center thing, which is good for user <-> data center coordination but otherwise not for users beyond informational. If users do use it, and how would we prevent them, they need to keep track of their versions and what they mean. I would recommend that users write extra header(s) to keep track of some processing steps. |
@chad-iris So we should drop the "user sets to unknown value on write"? If it is a data center thing, and the end user is not supposed to use it, then they should not change it, ever. So even after processing steps, it stays the same. The meaning is then is not "data version" but "the QC level of the data at the time it was retrieved from the datacenter". This also allows a user, even after lots of processing steps, to see if the datacenter has issued a new version of the same data and to choose to reprocess in that case. If that is what you mean, then 👍 from me. |
@crotwell That's mostly what I mean, exception is:
I'm very hesitant to attach much notion of quality, unless the definition of quality is "last is best" with no deeper meaning. |
@chad-iris Right, my typo, meaning is "the data version at the time is was retrieved". Larger is better, meaning is up to the individual data center. 👍 |
We might need a better name for this, "data version" makes it sound like this is the current version of the data, which is what I was trying say it is not. Maybe something like "publication version"? |
Discussion branched off #2. Concerns DRAFT20170622.
@crotwell
The text was updated successfully, but these errors were encountered: