Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base direction #165

Merged
merged 19 commits into from
Oct 10, 2019
Merged

Base direction #165

merged 19 commits into from
Oct 10, 2019

Conversation

gkellogg
Copy link
Member

@gkellogg gkellogg commented Oct 7, 2019

@gkellogg gkellogg requested review from dlongley and pchampin and removed request for dlongley October 7, 2019 23:08
@gkellogg
Copy link
Member Author

gkellogg commented Oct 7, 2019

I have all the algorithm updates except for to/from RDF (a bit of fromRdf is in).

Changes to term selection, in particular, but also value expansion and compaction are fairly major, and really need a thorough review. But, it is based on updates to my implementation, so I believe the core logic is sound.

Copy link
Member

@rubensworks rubensworks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments on the tests:

Some more tests to check invalid language tags and directions may be valuable (separately, and in combination),
such as https://w3.org/ns/i18n#en-us_abc, https://w3.org/ns/i18n#a b_rtl and https://w3.org/ns/i18n#a b_abc.

Minor typo in several test manifests "datatiype" -> "datatype"

Copy link
Contributor

@pchampin pchampin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed the parts concerning language maps, as I'm not familiar enough with this part of the algorithm to really appreciate the impact of the change.

In the "Value compaction" algorithm, I find steps 7-8-9 very verbose (and yet still incomplete, see my comments). May be I'll make a counter proposal at some point if I find a simpler way to express them.

But all in all, looks good. Thanks for this work.

index.html Outdated Show resolved Hide resolved
index.html Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated
<var>active property</var>, return the value associated with the
<li class="changed">Otherwise, if <var>value</var> has both `@language` and `@direction <a>entries</a>,
and the `@language` <a>entry</a> matches either the <a>language mapping</a> of
<var>active property</var> or the <a>default language</a> of <var>active context</var>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that be

  • it matches the language mapping of the active property OR
  • the language mapping of the active property is null and it matches the default language of the active context
    ?

Same for the the direction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was resolved in subsequent changes, can you verify?

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@gkellogg
Copy link
Member Author

gkellogg commented Oct 8, 2019

@rubensworks Updated your typo. Can you suggest some specific tests? I can see https://w3.org/ns/i18n#en-us_abc, but https://w3.org/ns/i18n#a b_rtl and others aren't valid IRIs.

I could see combinations of invalid language tags and invalid base directions; not sure if we are checking language tags against BCP47 in any tests, actually. We'd also need to check when converting from RDF.

I could see the following combinations, used in both expansion tests, toRdf tests, and fromRdf tests where decoding both @langugage, and https://w3.org/ns/i18n# and rdf:language/rdf:direction:

  • language tags: a+b, a.b, en-1, etc.
  • base direction a, foo, ...

But, I'm not sure we should be putting any real requirements for implementations to completely validate BCP47 other than some obvious problems like illegal characters ('+', '!', '~', %). But, it's a slippery slope.

@gkellogg
Copy link
Member Author

gkellogg commented Oct 8, 2019

In the "Value compaction" algorithm, I find steps 7-8-9 very verbose (and yet still incomplete, see my comments). May be I'll make a counter proposal at some point if I find a simpler way to express them.

Yes, it is pretty bad. I think this is a case where we go away from precise algorithm to describing the intentions:

Compact to a string, if any values of @language or @direction match those from the context, considering specific values in the term definition or defaults in the context and if the expanded value is not missing values for @language or @direction that should be there based on the term definition and defaults in the context.

Or, words to that effect.

@gkellogg
Copy link
Member Author

gkellogg commented Oct 9, 2019

In the "Value compaction" algorithm, I find steps 7-8-9 very verbose (and yet still incomplete, see my comments). May be I'll make a counter proposal at some point if I find a simpler way to express them.

I simplified value compaction, but there may be some corner cases, so I need to update my implementation to see if this handles everything, but it should read much better.

@gkellogg gkellogg requested a review from pchampin October 9, 2019 00:23
@gkellogg gkellogg marked this pull request as ready for review October 9, 2019 00:24
…ult language_) to just use _language mapping_ or _default language_, as there is no encoding.
index.html Show resolved Hide resolved
Copy link
Contributor

@dlongley dlongley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Lots of changes here. That being said, they seem ok to me (modulo some comments I left). I'm obviously saying this without having implemented.

Also, I'm not a fan of the compound literal approach and would probably like to see DB's implementations only support the i18n type approach if we support anything other than dropping @direction to start when going to RDF. We don't want the technical debt of the compound literal approach so may not support it unless there's clear uptake there. Not sure if anything additional here needs to be marked at risk.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@gkellogg
Copy link
Member Author

gkellogg commented Oct 9, 2019

Wow! Lots of changes here. That being said, they seem ok to me (modulo some comments I left). I'm obviously saying this without having implemented.

Also, I'm not a fan of the compound literal approach and would probably like to see DB's implementations only support the i18n type approach if we support anything other than dropping @direction to start when going to RDF. We don't want the technical debt of the compound literal approach so may not support it unless there's clear uptake there. Not sure if anything additional here needs to be marked at risk.

We should probably add at risk to both of these algorithms here, and in the syntax document, so they don't come back to bite us if there's not sufficient implementation support, or the community otherwise rejects them.

Looking at the implementation, the compound literal mechanism leverages off of the list support, but is certainly much more heavyweight. I wouldn't be sad if we settled to just use the i18n-datatype mechanism.

index.html Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
@iherman
Copy link
Member

iherman commented Oct 10, 2019

We should probably add at risk to both of these algorithms here, and in the syntax document, so they don't come back to bite us if there's not sufficient implementation support, or the community otherwise rejects them.

While I agree with this, I wonder whether, at this point, we should not consider these as non-normative, with a note that future versions of JSON-LD would settle on a normative version (or versions) depending on the community's reaction. I am not sure about it, but I still feel a bit uneasy about this WG suggesting normatively a solution in a space that goes beyond JSON-LD proper.

@iherman
Copy link
Member

iherman commented Oct 10, 2019

On the ...i18n#lt_dir option, although the specification itself is correct, we may want to emphasize (e.g., in a note) that the lt part may be empty, i.e., an ...i18n#_ltr is a correct datatype.

@rubensworks
Copy link
Member

@gkellogg

Can you suggest some specific tests? I can see https://w3.org/ns/i18n#en-us_abc, but https://w3.org/ns/i18n#a b_rtl and others aren't valid IRIs.

Indeed, it was my goal to make them invalid IRIs :-) (they would definitely be applicable for the toRdf tests)
But you're right that if we go this route, then we should probably (also?) include valid IRIs, but invalid language tags, as you suggested below.

I could see combinations of invalid language tags and invalid base directions; not sure if we are checking language tags against BCP47 in any tests, actually. We'd also need to check when converting from RDF.

There is indeed one test that checks the validity of language tags.

I could see the following combinations, used in both expansion tests, toRdf tests, and fromRdf tests where decoding both @langugage, and https://w3.org/ns/i18n# and rdf:language/rdf:direction:

language tags: a+b, a.b, en-1, etc.
base direction a, foo, ...

👍

But, I'm not sure we should be putting any real requirements for implementations to completely validate BCP47 other than some obvious problems like illegal characters ('+', '!', '~', %). But, it's a slippery slope.

We do it for IRIs, so in that sense it also makes sense to do it for BCP47. I don't have a strong opinion on this though, so perhaps this can be left open for a future JSON-LD version.
But if we decide to do proper BCP47 validation, then I'd be happy to contribute some tests for this.

@gkellogg
Copy link
Member Author

For the tests to work, all IRIs need to be parsable as N-Quads, so we can't effectively test truly illegal IRIs there.

The issue of whether we should validate language tags should be discussed on a call, as the current text only says that processors MAY warn on invalid language tags.

@gkellogg
Copy link
Member Author

Indeed, it was my goal to make them invalid IRIs :-) (they would definitely be applicable for the toRdf tests)
But you're right that if we go this route, then we should probably (also?) include valid IRIs, but invalid language tags, as you suggested below.

As I noted, using invalid IRIs is problematic, as you can't parse an N-Quads document with an invalid IRI.

I could see combinations of invalid language tags and invalid base directions; not sure if we are checking language tags against BCP47 in any tests, actually. We'd also need to check when converting from RDF.

There is indeed one test that checks the validity of language tags.

I could see the following combinations, used in both expansion tests, toRdf tests, and fromRdf tests where decoding both @langugage, and https://w3.org/ns/i18n# and rdf:language/rdf:direction:

language tags: a+b, a.b, en-1, etc.
base direction a, foo, ...

👍

But, I'm not sure we should be putting any real requirements for implementations to completely validate BCP47 other than some obvious problems like illegal characters ('+', '!', '~', %). But, it's a slippery slope.

We do it for IRIs, so in that sense it also makes sense to do it for BCP47. I don't have a strong opinion on this though, so perhaps this can be left open for a future JSON-LD version.
But if we decide to do proper BCP47 validation, then I'd be happy to contribute some tests for this.

As it happens, [RDF Concepts[(https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal) does require that language tags be well formed according to BCP47:

... a non-empty language tag as defined by [BCP47]. The language tag *must be well-formed according to section 2.2.9 of [BCP47].

The other RDF syntaxes use something like the following to validate language tags:

[144s] LANGTAG                        ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )* 

I could see us changing from a MAY validate language tags to MUST be valid either to [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )*, or the more strict ABNF from BCP47:

   obs-language-tag = primary-subtag *( "-" subtag )
   primary-subtag   = 1*8ALPHA
   subtag           = 1*8(ALPHA / DIGIT)

This should probably be done in context processing, expansion, and fromRDF algorithms to be consistent. The RDF Concepts regex is probably the one to use, as we're in that family.

I created issue #167 for this.

@gkellogg
Copy link
Member Author

I'm going to merge this, as all review comments have been cleared, and the remaining issue of validating language tags has been raised in #167.

@gkellogg gkellogg merged commit 00c32db into master Oct 10, 2019
@gkellogg gkellogg deleted the base-direction branch October 10, 2019 21:06
Copy link
Contributor

@pchampin pchampin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed the "Value compaction" allogrithm (I think this was the most signigicant change since my last review). It looks much better now 👍

index.html Show resolved Hide resolved
index.html Show resolved Hide resolved
<li>Otherwise, if <var>number entries</var> is <code>1</code> and
the <a>type mapping</a> of <var>active property</var>
is set to <code>@vocab</code>, return the result of using the
<li>Otherwise, the <a>type mapping</a> of <var>active property</var>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<li>Otherwise, the <a>type mapping</a> of <var>active property</var>
<li>Otherwise, if the <a>type mapping</a> of <var>active property</var>

index.html Show resolved Hide resolved
</ol>
</li>
<li>Otherwise, if <var>value</var> has an <code>@type</code> <a>entry</a> whose
value matches the <a>type mapping</a> of <var>active property</var>,
return the value associated with the <code>@value</code> <a>entry</a>
set <var>result</var> to the value associated with the <code>@value</code> <a>entry</a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Suggested change
set <var>result</var> to the value associated with the <code>@value</code> <a>entry</a>
return the value associated with the <code>@value</code> <a>entry</a>

</ol>
</li>
<li>If the value of `@language` in <var>item</var> exactly matches <var>language</var>,
if it is not `null`, or is not present, if it is `null`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer

Suggested change
if it is not `null`, or is not present, if it is `null`,
if it is not `null`, or is not present, if <var>language</var> is `null`,

<li>If the value of `@language` in <var>item</var> exactly matches <var>language</var>,
if it is not `null`, or is not present, if it is `null`,
<span class="changed">and the value of `@direction` in <var>item</var> exactly matches <var>direction</var>,
if it is not `null`, or is not present, if it is `null`</span>:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer

Suggested change
if it is not `null`, or is not present, if it is `null`</span>:
if it is not `null`, or is not present, if <var>direction</var> is `null`</span>:

or <var>value</var> has no `@index` <a>entry</a>,
set <var>result</var> to the value associated with the `@value` <a>entry</a>.</li>
</ol>
return <var>value</var>.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a remains of a previous version

Suggested change
return <var>value</var>.</li>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now "return result".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants