-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
language stem should respect langMatches semantics #71
Comments
Resolved with 20170915 meeting Resolution: change language tag matching to follow RFC4647 per voted by: Andra, Kat, ericP, tom |
need feedback from @VladimirAlexiev on spec changes and tests before closing. Note that the issue demo fails on master ( |
Spec sounds good, I like the ref to https://tools.ietf.org/html/rfc4647#section-3.3.1. Maybe say that Tests look correct, but:
Cheers @ericprud ! |
I was going to do a separate PR to add "*" to the grammar a la
I tried to find two region codes that where one was a substring of other. Do you know where I can find the canonical list of regions? I picked a valid three-letter ISO region code ("bel"). I guess I could switch from FR to DE and use the example from RFC4647 basic match. Re: case variation, true. Early on, I had data files like |
Regions: https://docs.google.com/spreadsheets/d/1M1yv9aBUmc-NyCJX69vOLUmH2uIglSwmDwgRgByI1AI/edit#gid=2001354273 and filter by type=region. But if there were, the matching is still the same: next should come dash or end of string. I.e. What do you want with
!!!!! Because And the star would add more complications |
Re case sensitivity, I varied the case in the data and the schema. The latter raised a round-tripping issue to RDF. I invite you to review those PRs. |
It is our belief that the semantics in ShEx 2.1 § 5.4.6 Values Constraint address this. Please close this issue if you agree. |
I've read the section and I think it addresses this by reference to other standards. In particular I like: In other words, one is not supposed to use an incomplete stem like |
The following shape:
:SpanishProduct { schema:label [ @es~ ] }
Declares that products must have a label in Spanish or any variant of it (eg
es-ES
vses-AR
).But LanguageStem is defined as simple prefix match (http://shex.io/shex-semantics/#nodeIn):
It has these defects:
"Carro"@ese
whereese
is Ese Ejja, and I don't think those people got cars ;-)"Carro"@ES
but lang tags are defined to be case-insensitive.st
should refer tos
)Instead of simple prefix match, it should comply with https://www.w3.org/TR/sparql11-query/#func-langMatches semantics. RFC4647 defines tags for lang, script, dialect, region etc etc; and that it's case-insensitive. Assuming
s
doesn't end in-
and assuming.
represents concat, it can be defined eg like:regex (l, "(^".s."$)|(^".s."-)", "i")
Note: a simpler regex would be
"^".s."($|-)"
but I don't believe the last part of it is valid.Aside: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry is a bit unreadable. The script https://gist.github.com/VladimirAlexiev/8733439 turns it into this more readable google sheet
TEST: @ericprud gave this example URL. For me, it doesn't load the test on first load (or control-shift-R) but loads it on second refresh (control-R):
http://rawgit.com/shexSpec/shex.js/master/doc/shex-simple.html?schema=%3CS%3E%20%7B%20%3Cp%3E%20%5B%40aa~%5D%20%7D&data=%3Cexact%3E%20%3Cp%3E%20%22exact%22%40aa%20.%0A%3Csub%3E%20%3Cp%3E%20%22sub%22%40aa-ES%20.%0A%3CshouldFail%3E%20%3Cp%3E%20%22shouldFail%22%40aaa-ES%20.%0A&shape-map=%7BFOCUS%20%3Cp%3E%20_%7D%40%3CS%3E
The text was updated successfully, but these errors were encountered: