-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Technical evaluation of the proposed source identifiers #31
Comments
Hi Three issues I see immediately. First is minor, but I think you want to decorate the Second, is it looks like the regular expressions allow lower case letters, but the new spec only allows upper case letters. I know SEED technically allows lower case, but we probably do not want to have a disagreement between the example code and the new spec. They should be consistent, either both allowing lower case or both forbidding it I think.
but
Third, it doesn't check for length, so the mapping is wrong in cases where the mapping should fail. For example an invalid SEED code like this fails (correctly):
but this passes even though it is wrong in the same way, just location instead of channel length:
and of course this location code is illegal even in the new spec. Would be good in this converter checked that input NSLC were valid SEED lengths (2,5,2,3) and on the reverse mapping failed for valid source ids that are too big to map to SEED style NSLC. It should also check the lengths of the N S L B S sS elements of the source id, 8,8,8,... Actually, looking at the spec, are their length restrictions on the band, source and subsource codes? I know the intent is for source codes to be FDSN issued, but subsource codes are mostly up to the user. Regardless a length limit would probably be really helpful to software. Did we miss that or am I just not seeing it? |
Also be nice if the temp network convention was part of the converter.
could do this instead:
|
Thanks @crotwell. The code started out as a mapping exercise with a little sprinkling of validation. I was unsure how much validation to add, but I agree that this is a good place to be strict. Essentially it will reduce to the strictest combination of SEED and SourceID rules. Note: there is a rather important class of usage where decomposing the SourceID into components that are SEED-like but do not conform to SEED rules. This is needed to store the longer-than-SEED codes in databases and other systems/formats; it will be the norm for a long time.
Fixed.
Fixed.
Fixed.
There are no length restrictions beyond using defined defined codes with fixed lengths. I disagree that subsource are mostly up to the user, and I doubt folks would come up with huge values because it made sense for their case. In libmseed the entire SourceID is capped at 64 bytes, seems a rational approach for any software. |
Can you clarify the purpose of the converter? Is it an example of how sourceIds work in code, to show how to convert valid NSLC to/from valid SourceIds or to provide a tool to store sourceIds in a database that was set up to handle separate NSLC as 4 string fields? The first two seem plausible uses as a "technical evaluation", but the last seems too specific to whatever system will use it. I guess my concern is that this current code shows a "mapping from sourceId to NSLC" that isn't in line with the spec, which says if the lengths are not ok, then there is no mapping.
which seems to imply that there is a mapping from these long codes to NSLC. I should say this code makes total sense in terms of IRIS storing things or searching for things in its database, but less sure that it makes sense in terms of a technical example of source ids in code in practice and may generate the idea that there are mappings that are not valid. I tossed together some javascript and a web page to do conversions with I think more strict checking. Perhaps it is useful in addition? http://www.seis.sc.edu/~crotwell/fdsn_source-identifiers/converter/ |
Pull request here: #32 |
It was for technical evaluation of the specification, with the hope that it would be useful for generic decomposition into individual codes and back. But this did not allow strict adherence to SEED limits and, as you pointed out, may leave the wrong impression of validity for all cases. So, I've re-arranged the code to focus on SourceID to/from SEED conversion and enforce SEED limitations.
that now produces:
But it is, depending on what you think NSLC means. This specification redefines all assumptions of what "NSLC" means, and we should be very careful to clarify when it is "SEED NSLC" versus the expanded codes.
Another example: the FDSN web service request formats/parameters require network, station, location, and channel fields. There will be oodles of places like this as nearly all existing codes deal with the NSLQ-quartet as individual codes at some level. The mapping between Source IDs and its constituent codes will be happening for a long time and have nothing to do with the SEED specification directly. |
Good point, when I hear NSLC I think SEED, but that is maybe no longer true. But if I am confused then there are lots of others that will be too, so being extra clear is worth while. Does the spec need to address this mapping to "generic" NSLC in addition to SEED NSLC? While the NSL parts are clear, I suppose there is some ambiguity in how to make a single string "channel" from the band, source, subsource. Basically should the spec say something along the lines of:
There is I guess some choice in how to map 'B_H_Z' as it could be 'BHZ' to correspond to SEED style, but perhaps also 'B_H_Z' to be consistent with longer source codes. If both are allowed, then any "equals" operation has to check two cases, which can be inefficient and lead to potential bugs. Maybe this doesn't matter too much, but would be good to be proactive to ensure as much consistency as possible. |
I vote no, mostly because there is no one clear way:
I think that's the problem, different implementations/uses will have different needs. What is clear is that As you say, the channel is where it gets potentially confusing. I think it will be very common to collapse the extended I've tried to address that in the expanded "Note" at the top of the channel page: http://docs.fdsn.org/projects/source-identifiers/en/v1.0/channel-codes.html Let me know what you think. |
So, I guess I am still feeling that this may add more confusion. It seems like there are 3 concepts floating around:
The spec is about 2, and includes a mapping to/from 1 which is needed as it is the current default. All makes sense to me. You also say mapping to 3 should not be in the spec, I agree. But the new Note and the python example seem to be mostly about 3. Just to be clear, there is nothing wrong with 3, it probably will be a common way to use things for some time to come, but I am not sure it belongs in the source id spec as it creates the idea that there is a cannonical way to break a single string source id into 4 substrings that look like NSLC string, just longer and different. This mapping makes total sense in a note on how to use source ids in existing fdsn web services, or in documentation on how to use them in stationxml, but maybe this mapping idea would be better given there instead of here so it doesn't muddy the waters. Another concern is that the Note is kind of duplicating the SEED mapping idea from the earlier section. I feel like all stuff about how to map to/from should live in just one section. My $0.02 is that you are promoting using a single string channel id, which is a really good idea. Don't weaken the argument by sort of kind of introducing a variation of the single string that is split into 4 strings. My advice would be to not say anything about idea 3 in the spec at all, and have the example code illustrate the strict SEED NSLC <-> SourceId mapping, but not go beyond that. The spec will be stronger for being more limited in what it does. |
Agreed. I've removed the comment in 2aa0507 |
Thoughts on javascript example/pull request? |
I added comments to the PR. Seems incomplete. I'm not opposed to adding it, but I'm curious what you consider the value is beyond another try at converting? Also, is this holding up the technical review addition to the report? |
Don't have to add it, I was just thinking a straight converter web page would be useful. Making it available somewhere so people can play might be a easier than having to run python locally. I really don't care that much about it, unless having multiple languages represented as examples is useful? |
Thinking some more, I would suggest you reject this pull request. I can release this separately and it doesn't need to be part of the fdsn repo. |
Hi all, I did a very basic check of the code, much less complete than Philip. The only question I have is that when I enter:
I get
Whereas I think Philip had suggested that it give
Is this too unspecific to code, or just not desirable? Cheers |
My nit is that this mapping is not universally true and adding it to the converter may imply that it is. While the FDSN will reserve the codes it's up to data centers if they will accept the codes equally. Second nit is that it's irreversible, one-way mapping. But I agree that its useful for illustrate as long as it's clear the mapping is being employed, so now the converter produces:
Now that I re-read that section of the spec. I note we've left too much vagueness in this:
which needs to be qualified that the 2-char code needs to start with XYZ0-9 explicitly. Obviously we cannot have
Let me know if you disagree. |
+1 on your code and on the change to the spec. |
Sounds good to me |
Pre-release versions of libmseed implement functions to convert the originally proposed identifiers to and from individual SEED codes. Furthermore, at the IRIS DMC, we have implemented database functions to convert between these identifiers and SEED codes.
To further evaluate the technical feasibility and practicalities of these identifiers, I have written a Python module that validates the identifiers and converts to and from SEED codes. The module includes a "main" for use as a standalone converter, and is included in the repository: fdsn_source_identifiers.py
For all of these software packages, mapping of details between the two standards follows the description in the specification:
http://docs.fdsn.org/projects/source-identifiers/en/v1.0/definition.html#mapping-of-seed-2-4-codes
With this experience, I'm quite confident in the feasibility of the proposed identifiers. They are usable and a good path exists between them and SEED codes.
@crotwell, @WayneCrawford, and @flofux I encourage you to try the converter to see if this matches your expectations.
The text was updated successfully, but these errors were encountered: