-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early 2023 meeting to organize an identifiers task group #36
Comments
Please note that on 2017-10-18, the Executive Committee recorded a decision in their issue tracker (https://github.com/tdwg/exec/issues/90, not publicly viewable) "The XC has agreed (XC1709) to deprecate the GUID AS." However, no notation was made in the standards documents that either the umbrella standard (http://www.tdwg.org/standards/150) was moved into the retired standards category, nor that the LSID Applicability Statement document (http://rs.tdwg.org/guid/doc/lsidas/, one of two documents in the standard) was marked as deprecated. So this deprecation should probably be considered unimplemented. Note also that on multiple occasions at TAG meetings in 2022 members stated that even if TDWG no longer recommends the adoption of LSID as a technology for new systems, they are in wide use. Therefore it is questionable whether it is appropriate to deprecate the LSID AS. |
@baskaufs Let's say that
Note added: my hope was to help this particular group understand the need to store these old LSIDs (in perpetuity) even if they won't resolve ever again. The fact they are cited in literature, means that when the related object is referenced, it has that LSID as one particular "identifier" associated with it, that remains useful. This group was just wanting not to import their LSIDs into their new database, but just forget about them. |
@debpaul: @rdmpage could say better, but my understanding is that LSIDs are still being minted in existing projects (like Zoobank and others) where their generation was established as part of the standard workflow years ago. So even if they don't resolve, they still are basically string identifiers for those objects. So one would treat them as any other authoritative string identifier and keep them. I think the main point of clarification is that for people setting up new identifier systems that TDWG no longer recommends them as a preferred identifier. But I defer to the actual experts on the subject. |
OK, we need to stop thinking that LSIDs are no longer being used! The recent review below lists pretty much every identifier type used in biodiversity informatics, and LSIDs feature several times. Databases such as ZooBank, IPNI, Index Fungorum, WoRMS, SpeciesFile, etc. use them. There are literally millions of LSIDs in the wild, and more are minted each data as new taxonomic names are published. What has changed since, say, 2005, is that many (but not all) LSIDs are no longer resolvable using the original LSID protocol. Hence web proxies such as https://lsid.io, which aim to make LSIDs still work. One way to think about the current situation is if DOIs stopped resolving, but people still minted them and included them in papers, etc. This is where we are with LSIDs. If TDWG is going to kill LSIDs, then IMHO it should
I guess it seems bizarre that TDWGs one significant foray into persistent identifiers crashed and burnt shortly before everyone got religion about persistent identifiers...
|
@rdmpage Please note that I did NOT say that LSIDs were no longer used. I think it is true that TDWG no longer says that they are the recommended identifier (as it once did). |
With all the healthy discussion about identifiers happening in various places at the moment (here, TDWG Slack, literature, etc), can anyone recommend a nice (free) online knowledge management system where we could keep this all together? Something like a wiki but also allowing for discussion, debate, comments, differing opinions, points of view, and so on in the knowledge creation process? It would be nice to draw this all together in something a bit more stable/permanent and openly available in anticipation of the proposed meeting. Suggestions welcome. (Just throwing in my two cents worth, perhaps the TDWG decision to deprecate LSIDs needs to be reconsidered, things may have changed since then. Personally I like LSIDs, they're user friendly, compared to UUIDs for example. That may be useful if our goal is community adoption of identifiers for things, ala ORCIDs for people for example, rather than only as pointers to database records). |
@ianengelbrecht What is missing using GitHub for the knowledge management system you are talking about? |
It's an early idea, but this issue is an example - it's a task to arrange a meeting, and we have important information on the meeting topic being added here already. So later on, someone has to refer back to this issue, and various other places, to find and synthesise all the information being offered. It'll be less discoverable when the meeting is had and this issue is closed too. Also, I'm not sure that Github issues are the ideal means of illiciting inputs from a group of people this early on in a discussion. We tried to do something like this with the tdwg/apis group recently. We had a fruitful and vibrant meeting about how APIs should work. The core questions and topics were identified, taken across to Gihub and one issue created for each. But it kinda died after that. I'm a fan of collaborating on Google Docs, where good discussion on a topic is possible with comments, and the document updated accordingly, but a Google Doc becomes unwieldly when there's lots of information, lots of people, lots of discussion and lots of editing. I see Notion are touting their platform as a wiki option, with commenting functionality. I'm sure there are other possibilities too. All just thoughts at this stage. |
@rdmpage @tucotuco @ianengelbrecht @baskaufs please note my comment/question about LSIDs above was raised by me, in a meeting this morning with taxonomists. Scenario: what to do with LSIDs that were minted in a given database (and published) in the past, but will no longer be resolving.
I never meant to imply that they would not be used by everyone / anyone ever again ... (I'm editing the above to clarify). |
@ianengelbrecht I know exactly what you mean. Anyone entering this world now (See above about folks who were new to the "identifier" vs "resolution" ideas) has a difficult time stepping in, when they read a GitHub thread like this one. Darwin Core Hour ... wiki might be one model solution. Another could be the GitHub "discussion" page where we could move these longer conversations and annotate them. ... |
I realize that the GitHub system has its issues of people not knowing how to use it and things being somewhat fragmented. But it has the huge advantage that work does not get lost, which soon becomes a problem when a project grows. Also, it is the repository of record for making public and preserving the work of TDWG groups. So if you use another system, you'll have to figure out later how to move the significant content to GitHub for archiving. One thing that I think works relatively well is to do editing and hashing out in Google docs, then exporting them as PDFs and uploading to GitHub when they are no longer being worked on. GitHub will render PDFs fine and with an organized directory structure, things stay findable. Some other system might be better, but somebody has to set it up and depending on how complicated it is, it may be no easier for people to use than GitHub. Another thing about the Issues Tracker feature in GitHub is that if you make good use of the tags and milestone features, it is reasonably easy to keep track of what's going on. I haven't used the other features @debpaul mentioned (wiki and discussion). They would have the advantage of archiving the work automatically. One thing I've observed about the GitHub wiki is that because they haven't been used much in TDWG, people don't think of looking at them. So if the wiki is used, one would want to say prominently on the repository landing page that people should look at the wiki to see the content. |
There's a bunch of things to unpack, for example:
Perhaps we could tease these apart and offer guidance on each? For example, I had a good discussion with @mdoering during TDWG 2022 about identifiers.org which provides standardised ways to refer to PIDs independent of particular ways to resolve them. We should provide a summary of the main contenders for PIDs (pros and cons), especially in terms of what work would be involved in each case (e.g., HTTP URIs are free, but you have to ensure they persist, DOIs use indirection so you can change URLs at will so long as you update DOI, etc.). Maybe cover DOI, Handle, ARK, HTTP URI, and LSID. Give actual examples with pros and cons. Discovery matters, otherwise people aren't likely to use other people's PIDs (which means we don't get any real benefits from PIDs). In many ways this is analogous to geocoding - going from a locality description to (lat,lon) coordinates. Resolution matters if anyone wants to build something on top of PIDs, it's also the best way to see if a PID actually means anything. If you don't make them resolvable you have no skin in the game, which implies the PID has no value to you (so why would anyone else care?). Perhaps where this is heading is a (hopefully) short document that sets out all the questions the TDWG community should be asking (basically the four above), coupled with a set of possible answers from which they could make a choice (or at least use to start the decision making process). |
This is a summary of the discussions held on TDWG TAG Slack channel to date, so we have it here for posterity: Rob Sanderson [4:18 PM] Rob Sanderson [4:30 PM] Roderic Page [6:13 PM] Rob Sanderson [9:56 PM] Deb Paul [11:55 PM] Roderic Page [1:58 PM] [sic] The applicability statements are available at https://www.tdwg.org/standards/guid-as/ 3 replies Steve Baskauf [5 months ago] Jonathan A Rees [5 months ago] Sunday, October 16th Roderic Page [11:34 PM] David Shorthouse [1:52 AM] Roderic Page [5:02 PM] |
Meeting document and brief notes from meeting held on 21 March 2023 is available here. |
Reading up on the (not) reached conclusions in the shared minutes ... I am looking forward to how the TAG is going to propose any next steps in this story. My take: The current immobility in this is not helping anybody, and only leads to unresolvable discussions and nonnegotiable positions of well intended people trying to be "right". While some "not entirely wrong but at least helpful" could end up be a practical version of "right enough" ? To overcome this we could accept the reality of where lsid and lsid-as have landed over time, and stop trying to change or fight that, but instead go into transparent "legacy management" and keep something of a managed list of valid criticisms, and for those (where possible) try to gently advise (or simply document) how people are practically dealing with them? Just 3 off the bat examples to get us started:
In fact dealing with this in "legacy management mode" might be the thing that liberates the TAG to formulate some alternative that can be "right" and self-motivating towards practically replacing what we have now? |
Given "strong community memories in relation to the failed life sciences identifier (LSID) scheme" [A choice of persistent identifier schemes for the Distributed System of Scientific Collections (DiSSCo)](https://doi.org/10.3897/rio.7.e67379] yes, legacy mode makes sense. URN registration doesn't seem to affect anyone. Apart from NBNs I hardly see URNs being actively used (although there are a few made up ones in GBIF). Fixing R29 and R31 seems irrelevant if LSIDs are being actively deprecated. This just leaves maintaining a resolver. We need to be clear about the scope if this. Does it only support destinations that still have (semi-)functioning LSID support, or does it offer resolution even if LSID support no longer exists (which is what https://lsid.io does)? Is TDWG going to commit to support this (and what would that look like?). Perhaps do the following:
|
Here is a technical note on why the GUID and Life Sciences Identifiers Applicability Statements page includes both of the applicability statements. For historical reasons unknown to me, it was decided to ratify both applicability statements as two documents that were part of a single standard. I think it is because it was originally envisioned that there would be many applicability statements (one for each technology: HTTP IRIs, DOI, etc.) to go with the umbrella GUID AS. That did not happen. After adoption of the Standards Documentation Specification, it was implemented using this model for relationships among standards components and with the IRI design patterns listed on that page. The SDS decreed that each standard must have a landing page to which the "permanent URLs" for the standards dereferenced. It was decided that the standards pages on the TDWG website would be the landing pages (vs. a GitHub repo README). So if you dereference http://www.tdwg.org/standards/150 it will take you to the standards page on the TDWG website that we are talking about. The SDS says that a standards landing page must clearly state the parts of a standard. In this case, there are two parts: the two AS documents. They have been assigned permanent IRIs of http://rs.tdwg.org/guid/doc/guidas/ and http://rs.tdwg.org/guid/doc/lsidas/, which dereference to the actual PDFs in GitHub. So with the current status of the standard, it's not possible to mess with the structure of that page because it's required by the SDS to contain the current information about what's included in the standard. Administratively, we could create a new standard with a new permanent URL and then move one or the other AS documents to that new standard. The question is: which one would be most disruptive to move. If people have been citing the permanent URL of the standard (as they should be), then one or the other of the AS documents would not be found via the standard landing page (although one could put a note there saying that it has been moved. It seems to me that if the GUID Task Group gets off the ground, it should create a new standard with a new permanent URL and maybe a different name. It would then have its own standard landing page separate from the page for http://www.tdwg.org/standards/150 (the one that lists the LSID AS). The old GUID AS could then be deprecated and removed from the list of docs at http://www.tdwg.org/standards/150 with a note on that page saying that the old GUID AS has been replaced by the new doc (whatever it's called). The header section of the new doc would have an entry with a link to the replaced GUID AS so that it would be easy to find the previous version. |
thx for these responses @rdmpage and @baskaufs, does feel a bit like the gist of my suggestion got lost in translation? With "legacy mode" I am hinting at slightly more than a formal deprecation and replacement, I also see the need for some acceptance of the legacy that has been created, and making sense and minimizing cost of that towards those that invested into it? From that angle, I should maybe rephrase my questions / take up some responses / make myself more clear:
ok, but should be considered against the cost (or chance) of "What if somebody else hijacks the lsid urn at IANA to start meaning something different" -- One could argue that it would not only hinder uses that are slow in switching away from a deprecated standard, but also damage the trust of any future proclaimed "persistent identifier" coming from tdwg? also: the fact that "they are no longer / not often used" is ignoring the fact that these lsid recommendations have introduced some legacy use. And imho "legacy mode" is about dealing with that fall-out elegantly and responsibly?
This has been out there and in use for some time... how to try and deal with that? So yeah, given the deprecation status it seems logic indeed to guide people towards not using either anymore.
That would translate to: sure, replace the homepage, mention deprecation, better alternatives and "legacy mode guides" but combine that with some statement that the persistence of the proxy form links is guaranteed (owning your legacy when it is about persistence is important imho) |
Just for fun, elsewhere on TDWG there is a discussion about the DwC field |
yes, but noone is actually resolving them. They are just unique strings defined in some datasets elsewhere for resolution. In fact the "real" identifier first has to be extracted from the LSIDs URN before it can be looked up. Simple CURIEs would have done the job too. |
But if the were resolvable then we wouldn't need to "extract" an identifier, we'd have the identifier already (the LSID). Plus we'd have the ability to check that it was correct (does it resolve to the thing we're talking about?) and potentially learn more (e.g., if the LSID was linked to other information). Of course, we blew this opportunity by having an identifier that is tricky to implement properly, and we created a cargo cult where it is OK to make up things that look like identifiers ("urn:xxx") but which have none of their properties (e.g., resolvability, machine readability, etc.). Insert somewhere here about "split milk", etc. ... |
At the 2022-11-07 TAG working session, it was agreed to close the three existing issues relating to identifiers (#14, #2, and #9) and replace them with this one. Please refer to those closed issues for background, suggested participants, and discussion about how the issue of LSID should be handled.
Specifically:
The text was updated successfully, but these errors were encountered: