Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing @dir ? #583

Closed
iherman opened this issue Feb 1, 2018 · 28 comments
Closed

Introducing @dir ? #583

iherman opened this issue Feb 1, 2018 · 28 comments
Labels
api defer Issue deferred to future Working Group spec-design syntax

Comments

@iherman
Copy link
Contributor

iherman commented Feb 1, 2018

In some situations it is important/necessary to include the base direction of a text, alongside its language; see the “Requirements for Language and Direction Metadata in Data Formats” for further details. In practice, in a vanilla JSON, it would require something like:

"title": [ { "value": "Moby Dick", "lang": "en" },
           { "value": "موبي ديك", "lang": "ar"  "dir": "rtl"}
         ]  

(the example comes from that document).

At this moment, I believe the only way you can reasonably express that in JSON-LD is via cheating a bit:

"title": [ { "@value": "Moby Dick", "@language": "en" },
           { "@value": "موبي ديك",  "@language": "ar"  "dir": "rtl"}
         ]  

and making sure that the dir term is not defined in the relevant @context so that, when generating the RDF output, that term is simply ignored. But that also means that there is no round-tripping, that term will disappear after expansion.

The difficulty lies in the RDF layer, in fact; RDF does not have any means (alas!) to express text direction. On the other hand, this missing feature is a general I18N problem whenever JSON-LD is used (there were issues when developing the Web Annotation Model, these issues are popping up in the Web Publication work, etc.).

Here is what I would propose as a non-complete solution

  1. Let us introduce a @dir term, alongside @language. This means this term can be used in place of dir above, ie, it is a bona-fide part of a string representation, and would therefore be kept in the compaction/expansion steps, can also be used for framing.
  2. In JSON-LD 1.1, @dir is ignored when transforming into RDF. I.e., only the language tag would be used.
  3. We may initiate some work in the RDF community to solve this issue. There may be several ways, each of them require the RDF community to chime in
    3.1. Define a mechanism of "parametrized" standard datatypes that represent a (language,direction) pair. One would then get something like[] ex:title "موبي ديك"^^rdf:internationalText(ar,rtl) ;
    3.2. Go for a "generalized" RDF where strings can also appear as subjects (that has been a matter of dispute for a long time...). That would give the possibility to add such attribute to texts like directions
    3.3. Some other mechanisms that I cannot think about
  4. In a future JSON-LD 1.* the @dir value can be properly mapped onto an RDF representing the right choices (if such choices are worked out)

Cc: @BigBlueHat @r12a

@gkellogg
Copy link
Member

gkellogg commented Feb 1, 2018

In general, I think this is a good idea; having a mechanism in place that could map to a future RDF standard for representing text direction is forward looking.

Having keywords which are ignored in the RDF transformation is not unprecedented: @index is also ignored.

There are also implications for term definitions and term selection when compacting:

  1. @dir might appear in a term definition, and be used to match values with the appropriate @dir.
  2. @dir could appear at the top level of a context, to set the default text direction.

Basically, most everything that is done with @language could be done with @dir (dir maps?)

@gkellogg gkellogg added this to the JSON-LD 1.1 milestone Feb 1, 2018
@dlongley
Copy link
Member

dlongley commented Feb 1, 2018

Just a quick bikeshedding note -- should we decide to support this feature, I think it would be better to use something longer that is more specific than @dir, e.g. @direction (better in that it is less ambiguous but still not great, IMO ... direction of what?).

@iherman
Copy link
Contributor Author

iherman commented Feb 2, 2018

@dlongley "direction" is the usual term in the i18n circles (for "base writing direction"); I think that, paired with @language, it is fine.

@iherman
Copy link
Contributor Author

iherman commented Feb 2, 2018

@gkellogg I was wondering about @direction maps, but I would have difficulties to come up with a use case for this. I would not want to add it to the spec just because we can do it...

@azaroth42
Copy link
Contributor

I'm nervous of introducing something that doesn't round trip through RDF. Either RDF is our conceptual model, or it isn't. This would open the door for also adding @type at the same time as @language, despite that language tagged strings in RDF 1.1 have an explicit data type.

Not -1, but I think we need to consider the slipperiness of the slope we're starting to slide down.

@BigBlueHat
Copy link
Member

Echoing @azaroth42 I'd prefer (if such things are even conceivable) that we iterate RDF to include text direction encoding.

Additionally, this doesn't address BiDi strings...nor can it (afaict):
https://www.w3.org/International/wiki/Html-bidi-isolation
https://www.w3.org/International/questions/qa-bidi-controls

@iherman
Copy link
Contributor Author

iherman commented Feb 3, 2018

@azaroth42, @BigBlueHat :

  1. The problem with RDF roundtripping makes me uneasy, too. However, the problem I see is that there is, at this moment, no WG/group/whatever that could authoritatively solve the RDF aspect of the problem, although that has been banging on our door for a while. I do not see any reality in setting up a WG on this. Our only avenue would be to raise the issue on the SW IG, and have some ad-hoc solution getting some level of consensus (although that can take a long time), but that would not have any normative status. (Probably the quickest solution would be to define a datatype, as I sketched in the original comment.)
  2. @BigBlueHat, it is perfectly true that this does not cover the detailed bidi control. But that would be way more difficult because it would lead to a reconstruction of what HTML does and would also require to impose a particular syntax in the literal itself. In some sense, using the rdf:HTML datatype may be the best solution but the comment in the string-meta document (but also this) is very relevant to the practical difficulties of this approach in practice.

The reason why I raised this issue, in spite of sharing the same reservations that both of you have, is purely pragmatic: communities face this issue and we simply do not have a satisfactory solution. And, I presume, the "perfect is an enemy of the good" principle may apply...

@iherman
Copy link
Contributor Author

iherman commented Feb 8, 2018

Having had some discussion with some colleagues my attention was drawn on the approach taken by the Activity Stream Rec, which is, essentially, to represent a text with a base direction by injecting the Unicode BiDi control characters at the beginning of the string (\u200F and \u200E for RTL and LTR, respectively). The advantage is that this works with RDF without further ado.

I am not convinced this is really good for authoring a text. However, in JSON-LD 1.1 we have the freedom of using "our" syntax, i.e.,

"title": [ { "@value": "Moby Dick", "@language": "en" }, { "@value": "موبي ديك", "@language": "ar" "@direction": "rtl"} ]

But specifying that, when generating an RDF literal, the value of @direction should be mapped on \u200F and \u200E.

I would still wait for the opinion of the I18N experts, but that would be an easy way of adding this missing feature to JSON-LD.

Cc: @azaroth42 @BigBlueHat

@lisp
Copy link

lisp commented Feb 8, 2018 via email

@iherman
Copy link
Contributor Author

iherman commented Feb 8, 2018

This is used exclusively for text literals; it is in the same category as @language. It has no bearing on URL-s or numeric values.

in what way does adding this dimension to this encoding in order to represent a dimension in the respective value domain improving the capacity or facility of the encoding itself to represent linked data?

I am not sure I understand the question... but I will try to see if I understand it right: the problem is not really related to the "linked" aspect of linked data, but to the literal values that are added to the Linked Data Cloud. A typical example would be the short description or the title of the book. The problem arises if the text includes a mixture of right-to-left and left-to-right text and we want to be sure that the consumer of the data (say, a program displaying the title and the description) does a proper job in terms of punctuation.

https://www.w3.org/International/articles/inline-bidi-markup/ describes some of the problem and the reason why it is necessary, in some cases, to have an explicit base direction (which is possible in HTML).

(As commented by @BigBlueHat, this does not solve all the issues, and we should not really try to do that, it would lead to reinventing the wheels of part of HTML.)

Cc @BigBlueHat @r12a

@lisp
Copy link

lisp commented Feb 8, 2018 via email

@workergnome
Copy link

workergnome commented Feb 8, 2018 via email

@gkellogg
Copy link
Member

gkellogg commented Feb 8, 2018

It may be that the best solution, for both JSON-LD and RDF, is to simply rely on rdf:HTML to handle these use cases.

In any case, if it were to make it's way into the RDF Abstract Syntax, there would be a much wider debate from the RDF community, and past experience is that you can't really predict where these things will come out.

It may have more consequence for SPARQL, which is in more of a need of update than RDF, IMHO. Querying and emitting text direction would be non-trivial.

@iherman
Copy link
Contributor Author

iherman commented Feb 9, 2018

@gkellogg

It may be that the best solution, for both JSON-LD and RDF, is to simply rely on rdf:HTML to handle these use cases.

Yes, I did discuss that with users. However, the complexity involved for the consumer of the data in, essentially, use a HTML parser makes it fairly complex on that end, so there was a push-back. I agree that a complex situation with lots of mixing of different type of data may have to use that solution, having a simple solution for the the simple case is really necessary imho.

In any case, if it were to make it's way into the RDF Abstract Syntax, there would be a much wider debate from the RDF community, and past experience is that you can't really predict where these things will come out.

I would keep away from getting into the RDF Abstract syntax case. There may be a solution in defining a datatype for that purpose which would not change the abstract syntax, but even defining that would be beyond what this group would/should do.

However, the very pragmatic solution used by the Activity Stream people does not use any extra feature, as far as RDF goes: adding a few UTF-8 characters into a literal is perfectly within the framework of today's RDF. In fact, JSON-LD could fully ignore the whole issue and rely on users using this trick even for literals that are used within JSON-LD. My fear is that deployment of such solution would hit the obstacle of the difficulties editing such script; introducing @direction could be seen as a syntactic help for the end users.

It may have more consequence for SPARQL, which is in more of a need of update than RDF, IMHO. Querying and emitting text direction would be non-trivial.

True. But this is not something that I see happening in the coming years...

@iherman
Copy link
Contributor Author

iherman commented Feb 9, 2018

@workergnome

So perhaps the essential detail of strings that is not currently captured is not text direction, per se, but an encoding of the script in which text text is written. Script is another characteristic of string data, somewhat orthogonal to language (as per https://www.w3.org/International/questions/qa-scripts#which, which points out that the language Azeri can be written in either Latin or Arabic script).

But this is completely covered by the language tag already, ie, this is not a problem. For a specific example, the tag zh-Hans denotes a text in Chinese, using the simplified script (used in mainland China and in Singapore), whereas zh-Hant is Chinese using traditional script as used in Taiwan or Hong Kong. (BCP 47 is surprisingly complex but powerful, see https://www.w3.org/International/articles/language-tags/).

As far as I know, there's a one-to-one mapping between direction and script, though I'm often proven wrong by the complexities of human language.

And most of the time it is indeed. But there are corner cases that require an additional information on the base direction that must be specified. This is the corner case that necessitates the rtl/ltr flags in HTML5. See https://www.w3.org/International/articles/inline-bidi-markup/

I'm still opposed to encoding data into JSON-LD syntax that isn't natively supported in other RDF serializations, though.

Yes. Ideally, even if the RDF Concept document is perfectly fine with the solution used by the Activity Stream standard, ideally a syntactic sugar would be good to have in Turtle, too. Pragmatically speaking, there are two serializations used for RDF these days on a large scale: JSON-LD and Turtle. JSON-LD has tendency (by design) to be used by RDF laypersons, or even people ignoring RDF, for whom an easy syntactic sugar would be welcome. (And I do not see any chance reopening the RDF WG at W3C as of now for this.)

(I could have added RDFa, but due to the specific environment of RDFa those users could more easily fall back on the more complete, but complex approach of using the rdf:HTML datatype.)

@lisp
Copy link

lisp commented Feb 9, 2018 via email

@iherman
Copy link
Contributor Author

iherman commented Feb 9, 2018

@lisp,

Thanks for the questions, it clarifies the intention. (This is of course all based on the supposition that we go along with the Activity Stream approach; the original proposal in the issue did not do that.)

what is a graph store processor to do when presented with an rdf document encoded as json-ld which includes assertions as to text direction?

The JSON-LD -> RDF processor is supposed to produce a text literal of the form "\\u200Fthe original text"@en

what will be the intent, when the feature is to be extended to turtle?

What the appropriate syntactic sugar would be in Turtle: I do not know. I have seen the proposal like "the original text"@en^ltr.

how is this encoding intended to round-trip through a sparql processor?
that is, if a sparql update request loads into a graph a document which is encoded as json-ld and a subsequent query produces a graph which includes terms which that load operation introduced into the store, how do those terms come to reflect any direction specifications present in the imported document in order that it can be reflected in the response?

You are right that it creates problems with SPARQL insofar as the SPARQL query syntax would also need to have something like that (probably following the Turtle syntax) and today it is not there. Ie, the only solution would be to operate with "\u200Fthe original text"@en` all along.

Note that, in JSON-LD 1.0, this is already a valid statement:

"title": {
	"@value" : "\u200Fthe original text",
	"@language" : "en"
}

A JSON-LD encoding of today's Activity Stream statement would have to this. Ie, to come back to your question, if the user uses SPARQL using this, it is fine. The usage of @direction is purely a syntactic sugar.


Just to make it clear: I do not really like this solution. But we do have a real use case to be solved: publishers may want to use JSON-LD to encode metadata like title or authors, there is need to express titles in different languages and, possibly, directions. At the moment, this is perfectly fine JSON-LD:

  "title": [
    {"@language": "fr", "@value": "Vingt mille lieues sous les mers"},
    {"@language": "en", "@value": "Twenty Thousand Leagues Under the Sea"},
    {"@language": "ja", "@value": "海底二万里"}
  ]

but if one wants to add a base direction, the only choice as of now is to add a \u200F or \u200E manually into the text. By saying

  "title": [
    {"@language": "fr", "@value": "Vingt mille lieues sous les mers"},
    {"@language": "en", "@value": "Twenty Thousand Leagues Under the Sea"},
    {"@language": "ja", "@value": "海底二万里", @direction: "ltr"}
  ]

we simplify the authors' lives.

@lisp
Copy link

lisp commented Feb 9, 2018 via email

@lisp
Copy link

lisp commented Feb 9, 2018 via email

@iherman
Copy link
Contributor Author

iherman commented Feb 9, 2018

@lisp,

it is my fault to have mixed up possibly three issues. Namely:

  1. Introducing the @direction into JSON-LD 1.1, without any effect on the generated RDF (working for users who use JSON-LD only in a round-trip manner, but the results are not transferred into RDF)
  2. A way of handling the issue in today's RDF by using the Unicode characters \u200F or \u200E
  3. Combining the two above by considering the @direction of (1) as a syntactic sugar for (2)

I am not sure which of the three aspect you object to. Note that using (2) is perfectly valid today, ie, "\u200Fabcdefg" is not equal to "abcdefg", which perfectly in line with the definitions you refer to.

@lisp
Copy link

lisp commented Feb 9, 2018 via email

@iherman
Copy link
Contributor Author

iherman commented Feb 9, 2018

in the sense which is implied by the “^ltr” proposal, as i understand the provisions which those passages from my earlier note describe for string comparison in the presence of language tags, no.

Why?

@lisp
Copy link

lisp commented Feb 9, 2018 via email

@BigBlueHat
Copy link
Member

Consider this a late-Friday afternoon idea for encouraging us to see things differently (for whatever it may teach us).

@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix oa: <http://www.w3.org/ns/oa> .

<#genesis>
  a schema:Book ;
  schema:name <data:text/plain,בראשית> .

<data:text/plain,בראשית> <oa:textDirection> <oa:rtlDirection> .

😸

@cwebber
Copy link
Contributor

cwebber commented Feb 9, 2018

I echo previous comments that not being RDF compatible is a no-go. My reason specifically being a very practical one, as somebody who does not even spend much time on the RDF end: it would break linked data signatures, or anything else using the RDF Dataset Normalization algorithm, no? I think a lot of us want to see more usage of json-ld that works in that direction. (Or at least it might leave out information so that two different sets of information could end up having the same normalized structure... bad especially for my hopes to move towards more and more content addressed storage for linked data...)

@cwebber
Copy link
Contributor

cwebber commented Feb 9, 2018

Anyway, that's all to say that I think one of two directions is best for now:

  • Use HTML and provide markup that way
  • Take the ActivityStreams bidi character route

I think we could always revisit if/when RDF gets native direction support.

@gkellogg gkellogg added defer Issue deferred to future Working Group and removed under-review labels Mar 12, 2018
@gkellogg
Copy link
Member

gkellogg commented Apr 9, 2018

Deferred to WG due to https://json-ld.org/minutes/2018-04-10/#resolution-3.

@gkellogg
Copy link
Member

Closed in favor of w3c/json-ld-syntax#11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api defer Issue deferred to future Working Group spec-design syntax
Projects
None yet
Development

No branches or pull requests

8 participants