-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing @dir ? #583
Comments
In general, I think this is a good idea; having a mechanism in place that could map to a future RDF standard for representing text direction is forward looking. Having keywords which are ignored in the RDF transformation is not unprecedented: There are also implications for term definitions and term selection when compacting:
Basically, most everything that is done with |
Just a quick bikeshedding note -- should we decide to support this feature, I think it would be better to use something longer that is more specific than |
@dlongley "direction" is the usual term in the i18n circles (for "base writing direction"); I think that, paired with |
@gkellogg I was wondering about |
I'm nervous of introducing something that doesn't round trip through RDF. Either RDF is our conceptual model, or it isn't. This would open the door for also adding Not -1, but I think we need to consider the slipperiness of the slope we're starting to slide down. |
Echoing @azaroth42 I'd prefer (if such things are even conceivable) that we iterate RDF to include text direction encoding. Additionally, this doesn't address BiDi strings...nor can it (afaict): |
The reason why I raised this issue, in spite of sharing the same reservations that both of you have, is purely pragmatic: communities face this issue and we simply do not have a satisfactory solution. And, I presume, the "perfect is an enemy of the good" principle may apply... |
Having had some discussion with some colleagues my attention was drawn on the approach taken by the Activity Stream Rec, which is, essentially, to represent a text with a base direction by injecting the Unicode BiDi control characters at the beginning of the string (\u200F and \u200E for RTL and LTR, respectively). The advantage is that this works with RDF without further ado. I am not convinced this is really good for authoring a text. However, in JSON-LD 1.1 we have the freedom of using "our" syntax, i.e.,
But specifying that, when generating an RDF literal, the value of I would still wait for the opinion of the I18N experts, but that would be an easy way of adding this missing feature to JSON-LD. |
On 2018-02-08, at 16:10, Ivan Herman ***@***.***> wrote:
Having had some discussion with some colleagues my attention was drawn on the approach taken by the Activity Stream Rec, which is, essentially, to represent a text with a base direction by injecting the Unicode BiDi control characters at the beginning of the string (\u200F and \u200E for RTL and LTR, respectively). The advantage is that this works with RDF without further ado.
I am not convinced this is really good for authoring a text. However, in JSON-LD 1.1 we have the freedom of using "our" syntax, i.e.,
"title": [ { ***@***.***": "Moby Dick", ***@***.***": "en" }, { ***@***.***": "موبي ديك", ***@***.***": "ar" ***@***.***": "rtl"} ]
But specifying that, when generating an RDF literal, the value of @direction should be mapped on \u200F and \u200E.
in what way does adding this dimension to this encoding in order to represent a dimension in the respective value domain improving the capacity or facility of the encoding itself to represent linked data?
would one do this for iri values? for the sign of numeric values? for media types?
for any other domain where the internal structure leads to significant differences in the control flow governing presentation and/or valid operations?
… I would still wait for the opinion of the I18N experts, but that would be an easy way of adding this missing feature to JSON-LD.
Cc: @azaroth42 @BigBlueHat
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This is used exclusively for text literals; it is in the same category as
I am not sure I understand the question... but I will try to see if I understand it right: the problem is not really related to the "linked" aspect of linked data, but to the literal values that are added to the Linked Data Cloud. A typical example would be the short description or the title of the book. The problem arises if the text includes a mixture of right-to-left and left-to-right text and we want to be sure that the consumer of the data (say, a program displaying the title and the description) does a proper job in terms of punctuation. https://www.w3.org/International/articles/inline-bidi-markup/ describes some of the problem and the reason why it is necessary, in some cases, to have an explicit base direction (which is possible in HTML). (As commented by @BigBlueHat, this does not solve all the issues, and we should not really try to do that, it would lead to reinventing the wheels of part of HTML.) Cc @BigBlueHat @r12a |
On 2018-02-08, at 17:13, Ivan Herman ***@***.***> wrote:
This is used exclusively for text literals; it is in the same category as @language. It has no bearing on URL-s or numeric values.
in what way does adding this dimension to this encoding in order to represent a dimension in the respective value domain improving the capacity or facility of the encoding itself to represent linked data?
I am not sure I understand the question…
yes, i see that, where you do not understand the symmetry between @language and numeric values.
i, on the other hand, have never understood this note in the description of rdf semantics:
Language-tagged strings have the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is formally defined for this IRI because the definition of datatypes does not accommodate language tags in the lexical space. The value space associated with this datatype IRI is the set of all pairs of strings and language tags.
and have understood that anomaly to have been historically determined by the origins of rdf, to "describe resources”.
but I will try to see if I understand it right: the problem is not really related to the "linked" aspect of linked data, but to the literal values that are added to the Linked Data Cloud.
that is the issue to which i point.
A typical example would be the short description or the title of the book. The problem arises if the text includes a mixture of right-to-left and left-to-right text and we want to be sure that the consumer of the data (say, a program displaying the title and the description) does a proper job in terms of punctuation.
https://www.w3.org/International/articles/inline-bidi-markup/ describes some of the problem and the reason why it is necessary, in some cases, to have an explicit base direction (which is possible in HTML).
that is, whether json-ld is intended to serve a markup medium or as a means to encode linked data.
yes i have read the suggestion, that rdf should be extended to support this aspect of markup, but i do not see why that would be a generally beneficial approach.
best regards, from berlin,
|
To echo what I believe James is saying, I think that we need to be careful
not to conflate string rendering concerns (like direction) with the
semantic concerns of string data.
So perhaps the essential detail of strings that is not currently captured
is not text direction, per se, but an encoding of the script in which text
text is written. Script is another characteristic of string data, somewhat
orthogonal to language (as per
https://www.w3.org/International/questions/qa-scripts#which, which points
out that the language Azeri can be written in either Latin or Arabic
script).
As far as I know, there's a one-to-one mapping between direction and
script, though I'm often proven wrong by the complexities of human
language.
I'm still opposed to encoding data into JSON-LD syntax that isn't natively
supported in other RDF serializations, though.
- David Newbury
-----------------------------------
p. (773) 547-2272
e. [email protected]
On Thu, Feb 8, 2018 at 8:35 AM, james anderson <[email protected]>
wrote:
…
> On 2018-02-08, at 17:13, Ivan Herman ***@***.***> wrote:
>
> This is used exclusively for text literals; it is in the same category
as @language. It has no bearing on URL-s or numeric values.
>
> in what way does adding this dimension to this encoding in order to
represent a dimension in the respective value domain improving the capacity
or facility of the encoding itself to represent linked data?
>
> I am not sure I understand the question…
yes, i see that, where you do not understand the symmetry between
@language and numeric values.
i, on the other hand, have never understood this note in the description
of rdf semantics:
> Language-tagged strings have the datatype IRI
http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. No datatype is
formally defined for this IRI because the definition of datatypes does not
accommodate language tags in the lexical space. The value space associated
with this datatype IRI is the set of all pairs of strings and language tags.
and have understood that anomaly to have been historically determined by
the origins of rdf, to "describe resources”.
> but I will try to see if I understand it right: the problem is not
really related to the "linked" aspect of linked data, but to the literal
values that are added to the Linked Data Cloud.
that is the issue to which i point.
> A typical example would be the short description or the title of the
book. The problem arises if the text includes a mixture of right-to-left
and left-to-right text and we want to be sure that the consumer of the data
(say, a program displaying the title and the description) does a proper job
in terms of punctuation.
>
> https://www.w3.org/International/articles/inline-bidi-markup/ describes
some of the problem and the reason why it is necessary, in some cases, to
have an explicit base direction (which is possible in HTML).
that is, whether json-ld is intended to serve a markup medium or as a
means to encode linked data.
yes i have read the suggestion, that rdf should be extended to support
this aspect of markup, but i do not see why that would be a generally
beneficial approach.
best regards, from berlin,
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#583 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACG6EFRcptTd7r49k6tasJlPV9-YcTyks5tSyJWgaJpZM4R1cGM>
.
|
It may be that the best solution, for both JSON-LD and RDF, is to simply rely on In any case, if it were to make it's way into the RDF Abstract Syntax, there would be a much wider debate from the RDF community, and past experience is that you can't really predict where these things will come out. It may have more consequence for SPARQL, which is in more of a need of update than RDF, IMHO. Querying and emitting text direction would be non-trivial. |
Yes, I did discuss that with users. However, the complexity involved for the consumer of the data in, essentially, use a HTML parser makes it fairly complex on that end, so there was a push-back. I agree that a complex situation with lots of mixing of different type of data may have to use that solution, having a simple solution for the the simple case is really necessary imho.
I would keep away from getting into the RDF Abstract syntax case. There may be a solution in defining a datatype for that purpose which would not change the abstract syntax, but even defining that would be beyond what this group would/should do. However, the very pragmatic solution used by the Activity Stream people does not use any extra feature, as far as RDF goes: adding a few UTF-8 characters into a literal is perfectly within the framework of today's RDF. In fact, JSON-LD could fully ignore the whole issue and rely on users using this trick even for literals that are used within JSON-LD. My fear is that deployment of such solution would hit the obstacle of the difficulties editing such script; introducing
True. But this is not something that I see happening in the coming years... |
But this is completely covered by the language tag already, ie, this is not a problem. For a specific example, the tag
And most of the time it is indeed. But there are corner cases that require an additional information on the base direction that must be specified. This is the corner case that necessitates the rtl/ltr flags in HTML5. See https://www.w3.org/International/articles/inline-bidi-markup/
Yes. Ideally, even if the RDF Concept document is perfectly fine with the solution used by the Activity Stream standard, ideally a syntactic sugar would be good to have in Turtle, too. Pragmatically speaking, there are two serializations used for RDF these days on a large scale: JSON-LD and Turtle. JSON-LD has tendency (by design) to be used by RDF laypersons, or even people ignoring RDF, for whom an easy syntactic sugar would be welcome. (And I do not see any chance reopening the RDF WG at W3C as of now for this.) (I could have added RDFa, but due to the specific environment of RDFa those users could more easily fall back on the more complete, but complex approach of using the |
On 2018-02-09, at 07:15, Ivan Herman ***@***.***> wrote:
...
Yes. Ideally, even if the RDF Concept document is perfectly fine with the solution used by the Activity Stream standard, ideally a syntactic sugar would be good to have in Turtle, too. Pragmatically speaking, there are two serializations used for RDF these days on a large scale: JSON-LD and Turtle. JSON-LD has tendency (by design) to be used by RDF laypersons, or even people ignoring RDF, for whom an easy syntactic sugar would be welcome. (And I do not see any chance reopening the RDF WG at W3C as of now for this.)
what is a graph store processor to do when presented with an rdf document encoded as json-ld which includes assertions as to text direction?
what will be the intent, when the feature is to be extended to turtle?
how is this encoding intended to round-trip through a sparql processor?
that is, if a sparql update request loads into a graph a document which is encoded as json-ld and a subsequent query produces a graph which includes terms which that load operation introduced into the store, how do those terms come to reflect any direction specifications present in the imported document in order that it can be reflected in the response?
i would understand there to be two options.
either one extends the dimensionality of the string term representation beyond the abstract rdf model to allow for direction or one extends the lexical form of string variations to include direction.
as the direction concerns presentation rather than semantics, neither is appropriate.
if it were to concern semantics, that would argue, that it should be reified through a predicate.
|
Thanks for the questions, it clarifies the intention. (This is of course all based on the supposition that we go along with the Activity Stream approach; the original proposal in the issue did not do that.)
The JSON-LD -> RDF processor is supposed to produce a text literal of the form
What the appropriate syntactic sugar would be in Turtle: I do not know. I have seen the proposal like
You are right that it creates problems with SPARQL insofar as the SPARQL query syntax would also need to have something like that (probably following the Turtle syntax) and today it is not there. Ie, the only solution would be to operate with "\u200Fthe original text"@en` all along. Note that, in JSON-LD 1.0, this is already a valid statement:
A JSON-LD encoding of today's Activity Stream statement would have to this. Ie, to come back to your question, if the user uses SPARQL using this, it is fine. The usage of Just to make it clear: I do not really like this solution. But we do have a real use case to be solved: publishers may want to use JSON-LD to encode metadata like title or authors, there is need to express titles in different languages and, possibly, directions. At the moment, this is perfectly fine JSON-LD:
but if one wants to add a base direction, the only choice as of now is to add a
we simplify the authors' lives. |
On 2018-02-09, at 11:29, Ivan Herman ***@***.***> wrote:
...
Note that, in JSON-LD 1.0, this is already a valid statement:
"title": {
***@***.***" : "\u200Fthe original text",
***@***.***" : "en"
}
that is, the already defined encoding suffices, as is.
A JSON-LD encoding of today's Activity Stream statement would have to this. Ie, to come back to your question, if the user uses SPARQL using this, it is fine. The usage of @direction is purely a syntactic sugar.
Just to make it clear: I do not really like this solution.
which does little to strengthen an argument to accept it.
But we do have a real use case to be solved:
… [whereby]
if one wants to add a base direction, the only choice as of now is to add a \u200F or \u200E manually into the text.
to the extent that the application is compelled to entrain presentation information in the data, that seems an appropriate method and keeps the concern orthogonal from those of the myriad encoding forms.
in particular, of json-ld, which (i had the belief) intends to encode relations among things rather than markup their presentation.
|
On 2018-02-09, at 11:29, Ivan Herman ***@***.***> wrote:
what will be the intent, when the feature is to be extended to turtle?
What the appropriate syntactic sugar would be in Turtle: I do not know. I have seen the proposal like "the original ***@***.***^ltr.
in case you do not appreciate why i am convinced that the apparent intent would be bad idea, please note these passages
- https://www.w3.org/TR/sparql11-query/#func-arg-compatibility
- https://www.w3.org/TR/sparql11-query/#func-RDFterm-equal
-> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
and explain why it would be a good one.
|
it is my fault to have mixed up possibly three issues. Namely:
I am not sure which of the three aspect you object to. Note that using (2) is perfectly valid today, ie, "\u200Fabcdefg" is not equal to "abcdefg", which perfectly in line with the definitions you refer to. |
On 2018-02-09, at 13:03, Ivan Herman ***@***.***> wrote:
@lisp,
it is my fault to have mixed up possibly three issues. Namely:
• Introducing the @direction into JSON-LD 1.1, without any effect on the generated RDF (working for users who use JSON-LD only in a round-trip manner, but the results are not transferred into RDF)
• A way of handling the issue in today's RDF by using the Unicode characters \u200F or \u200E
• Combining the two above by considering the @direction of (1) as a syntactic sugar for (2)
I am not sure which of the three aspect you object to.
all of them in one sense or another, but the second does permit a processor to treat the intended domain value as opaque.
Note that using (2) is perfectly valid today, ie, "\u200Fabcdefg" is not equal to "abcdefg", which perfectly in line with the definitions you refer to.
in a naive sense yes.
in the sense which is implied by the “^ltr” proposal, as i understand the provisions which those passages from my earlier note describe for string comparison in the presence of language tags, no.
|
Why? |
On 2018-02-09, at 14:34, Ivan Herman ***@***.***> wrote:
> Note that using (2) is perfectly valid today, ie, "\u200Fabcdefg" is not equal to "abcdefg", which perfectly in line with the definitions you refer to.
in the sense which is implied by the “^ltr” proposal, as i understand the provisions which those passages from my earlier note describe for string comparison in the presence of language tags, no.
Why?
because the language tag adds an additional dimension to processing control paths in order to accommodate the equivalence rules.
to make matters worse, its definition is not even complete.
either the domain should be expanded and include operations which close over the entire space, or, if that is not the intent, the relationship should be reified via a predicate.
to apply sugar to the situation would be to compound a mistake.
see: https://www.w3.org/TR/sparql11-query/#modOrderBy
|
Consider this a late-Friday afternoon idea for encouraging us to see things differently (for whatever it may teach us). @base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix oa: <http://www.w3.org/ns/oa> .
<#genesis>
a schema:Book ;
schema:name <data:text/plain,בראשית> .
<data:text/plain,בראשית> <oa:textDirection> <oa:rtlDirection> . 😸 |
I echo previous comments that not being RDF compatible is a no-go. My reason specifically being a very practical one, as somebody who does not even spend much time on the RDF end: it would break linked data signatures, or anything else using the RDF Dataset Normalization algorithm, no? I think a lot of us want to see more usage of json-ld that works in that direction. (Or at least it might leave out information so that two different sets of information could end up having the same normalized structure... bad especially for my hopes to move towards more and more content addressed storage for linked data...) |
Anyway, that's all to say that I think one of two directions is best for now:
I think we could always revisit if/when RDF gets native |
Deferred to WG due to https://json-ld.org/minutes/2018-04-10/#resolution-3. |
Closed in favor of w3c/json-ld-syntax#11. |
In some situations it is important/necessary to include the base direction of a text, alongside its language; see the “Requirements for Language and Direction Metadata in Data Formats” for further details. In practice, in a vanilla JSON, it would require something like:
(the example comes from that document).
At this moment, I believe the only way you can reasonably express that in JSON-LD is via cheating a bit:
and making sure that the
dir
term is not defined in the relevant@context
so that, when generating the RDF output, that term is simply ignored. But that also means that there is no round-tripping, that term will disappear after expansion.The difficulty lies in the RDF layer, in fact; RDF does not have any means (alas!) to express text direction. On the other hand, this missing feature is a general I18N problem whenever JSON-LD is used (there were issues when developing the Web Annotation Model, these issues are popping up in the Web Publication work, etc.).
Here is what I would propose as a non-complete solution
@dir
term, alongside@language
. This means this term can be used in place ofdir
above, ie, it is a bona-fide part of a string representation, and would therefore be kept in the compaction/expansion steps, can also be used for framing.@dir
is ignored when transforming into RDF. I.e., only the language tag would be used.3.1. Define a mechanism of "parametrized" standard datatypes that represent a (language,direction) pair. One would then get something like
[] ex:title "موبي ديك"^^rdf:internationalText(ar,rtl) ;
3.2. Go for a "generalized" RDF where strings can also appear as subjects (that has been a matter of dispute for a long time...). That would give the possibility to add such attribute to texts like directions
3.3. Some other mechanisms that I cannot think about
@dir
value can be properly mapped onto an RDF representing the right choices (if such choices are worked out)Cc: @BigBlueHat @r12a
The text was updated successfully, but these errors were encountered: