-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDF URIs for language tags and / or language subtags #13
Comments
This ought to be coordinated with the i18n namespace defined in JSON-LD 1.1. |
@aphillips , the latest comment from @niklasl is an interesting input to our discussion with John Klensin. |
Can I get a reference / link to the Library of Congress set of URIs for ISO 639 ? We are facing an issue with the Lexvo ones: |
@jonquet I think you might be confused by the distinction between what 639 does and how language tags are composed. The Library of Congress is a reference for ISO-639-1. This is not the only part of ISO 639: it's only the 2-letter codes. The RA is the Summer Institute of Language (SIL), who maintain ISO-639-3 (parts -1 and -2 are derived from this, note that I'm simplifying a lot). However... Language Tags are defined by IETF BCP47. These tags include multiple standards, including ISO 639 for languages, ISO 4217 for scripts, ISO 3166 for country/regions. These codes (called "subtags") can be composed to form complete language such as There is a registry of valid subtags maintained by IANA here[https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry}. This registry tracks all of the parts of ISO639 as well as the other standards that are used in language tags. However, it is one large "cookie-jar" format file with all of the subtags in it. This issue, where we're discussing this, reflects a known gap for RDF: there is no URL reference for composed language tags. This WG investigated what would be required to create one in the 2020/2021. It would be possible to do this at IETF/IANA, but no one wrote the Internet-Draft to carry the work forward. cf. action result |
There are also the T and U extensions to BCP47. The T extension would as a minimum have to be ticked off. Library of Congress' increasing use of Bibframe and their current preference for romanised data means that most of their linked data will require T extensions as part of the language tag. |
Thanks @aphillips for detailed info. Indeed this confirms the way the 'pt-BR' code is built ... and that there is no URI yet to identify those subtags. |
Over the years, the RDF community has developed several concrete sets of URIs for identifying languages. Examples:
The URIs in these sets are based on ISO 639 , often extended with further URIs e.g. to identify language (variants) that are not part of 639, e.g. underressourced or historic languages.
There are various groups that provide such URIs or the underlying values, e.g. the two efforts mentioned above, or the library of congress.
Some arguments for providing URIs for language (sub) tags, taken from this thread:
https://lists.w3.org/Archives/Public/public-ontolex/2020Apr/0006.html
Some open questions:
The above is just a summary of what I read from the thread. Below is an observation.
The RDF community "likes" to provide information as URIs - that is a "selling point" of RDF itself. At the moment, the URI "providers" for language information are scattered across organizations and research groups. Also, there are open questions like the validation aspect of language tags - which are solved in BPC 47, but not in the URI version(s) of language tags.
A lot of this discussion has to do with understanding about
Since the RDF community does not have one accepted provider of URIs, it is hard to have the right stakeholders on the table.
A next step for the BCP 47 community could be to fill a gap: provide URIs for the entries of the language sub tag registry. In that way, more understanding of BCP 47 could be brought to the RDF community, and W3C and / or IETF could be recognized as the proper stakeholder for this task.
The text was updated successfully, but these errors were encountered: