Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting the (text) direction #220

Closed
iherman opened this issue Jun 11, 2018 · 53 comments
Closed

Setting the (text) direction #220

iherman opened this issue Jun 11, 2018 · 53 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic:internationalization topic:manifest topic:metadata topic:schema mapping

Comments

@iherman
Copy link
Member

iherman commented Jun 11, 2018

This is a completely open issue at this moment, both for JSON-LD and Schema.org... The only (incomplete) approach would be to rely on, and base everything, on the UTF-encoding of the text...

@iherman
Copy link
Member Author

iherman commented Jun 11, 2018

See also the separate discussion on the JSON-LD 1.1 CG: json-ld/json-ld.org#583

@iherman
Copy link
Member Author

iherman commented Jun 11, 2018

Another reference: https://w3c.github.io/string-meta/

@danielweck
Copy link
Member

The only (incomplete) approach would be to rely on, and base everything, on the UTF-encoding of the text...

Do you mean Unicode Bidi? http://unicode.org/reports/tr9/

@iherman
Copy link
Member Author

iherman commented Jun 12, 2018

@danielweck

More specifically, see json-ld/json-ld.org#583 (comment) : which referred to this. The following discussion (which was, as far as I am concerned, inconclusive) gave some pro and cons to that approach.

Note that the JSON-LD CG decided to defer that issue to the JSON-LD WG which has just been formed; I hope that the discussion will re-start with some more people involved (eg, Schema.org people as well). We may want to defer this issue to see where that discussion go.

@iherman
Copy link
Member Author

iherman commented Jun 13, 2018

@TzviyaSiegman reminded me that there is another approach that is perfectly viable, namely use the HTML datatype. What this means in practice is that, if a text has bidirectional issues, it could use HTML syntax and the result would be considered to be a string of an HTML datatype in RDF parlance. Here is what it would mean in JSON-LD:

{
    "name" : {
        "@value" : "We find the phrase '<span dir="rtl" lang="he">פעילות הבינאום</span>' 5 times on the page.",
        "@type" : "rdf:HTML"
    }
}

(The trick is to ensure that the character '5' appears on the right hand side of a Hebrew text. If the span is not used, the number will be used as if it was part of the hebrew text and will appear on the left of it!)

From an internationalization point of view, that is much better, because it gives a better control. We could therefore say that, for example for the term name, the author should use that approach. I see two problems with that, too:

  1. It may be an extra load on authors (and maybe not; I am not sure how frequent these occurrences are)
  2. Just as for the case Expressing the language of an item, say, "name" #219, while the above JSON-LD is perfectly fine, the google structured data tester does not accept it :-(

@laudrain
Copy link

When written as:

{
    "name" : {
        "@value" : "We find the phrase '<span dir='rtl' lang='he'>פעילות הבינאום</span>' 5 times on the page.",
        "@type" : "rdf:HTML"
    }
}

the google structured data tester seems to validate it.
I have replaced the double quotes by single quotes for the attribute values which is ok in HTML5.

The following HTML5 document is valid in https://validator.w3.org/:

<title>I AM YOUR DOCUMENT TITLE REPLACE ME</title>

We find the phrase 'פעילות הבינאום' 5 times on the page.

@iherman
Copy link
Member Author

iherman commented Jun 13, 2018

@laudrain oops:-) That was my mistake. But then... this looks good as a solution for direction.

However, you as a potential author: how would you like it?

@laudrain
Copy link

laudrain commented Jun 13, 2018

I like it.
The question is will it be possble to repeat the name of the author with multiple scriptures and directions?

Taking an example from EPUB 3.1 spec[1] with a Japanese name:

{
    "name" : {
        "@value" : "Haruki Murakami",
        "@type" : "rdf:HTML"
    }
}
{
    "name" : {
        "@value" : "<span dir='rtl' lang='ja'>村上 春樹</span>",
        "@type" : "rdf:HTML"
    }
}

Is this correct? Even possible?

[1] http://www.idpf.org/epub/31/spec/epub-packages.html#sec-shared-attrs

@iherman
Copy link
Member Author

iherman commented Jun 13, 2018

@laudrain which checker did you use? I just tested something that is based on what you ask on https://search.google.com/structured-data/testing-tool and I get an error:

screen shot 2018-06-13 at 15 15 28

@iherman
Copy link
Member Author

iherman commented Jun 13, 2018

(The previous example is accepted by the JSON-LD playground...)

@laudrain
Copy link

@iherman
Copy link
Member Author

iherman commented Jun 13, 2018

It is the same tool, @laudrain. However, try to add a "@context":"http://schema.org" :-(

@BigBlueHat
Copy link
Member

https://schema.org/name is only defined as https://schema.org/Text, so it can't contain HTML. Sorry folks.

@laudrain
Copy link

End of the game ?

@iherman
Copy link
Member Author

iherman commented Jun 13, 2018

@BigBlueHat

I could argue that "text", at least in RDF land (though it is called "Literal"), may have a datatype, and this is all what the HTML stuff does but... me arguing does not make any sense, obviously.

Sigh...

@iherman
Copy link
Member Author

iherman commented Jun 13, 2018

@laudrain rather back to square one.

Putting UTF directionality code into the text works, see the examples on the Activity Stream spec. It is just ugly and may create problems with search.

@laudrain
Copy link

Why problems with search? The characteristics of this code should prevent them:
https://codepoints.net/U+2067

@iherman
Copy link
Member Author

iherman commented Jun 13, 2018

@laudrain I think (and I am a bit on a slippery slope, because am not an expert of these things) the problem is that search (or query in a database) is based on comparing unicode points, and it is way too easy to make the mistake and give a search term that does not include those extra characters. That may be the issue. This is certainly the case when doing database search in a graph database (e.g., using SPARQL).

@danielweck
Copy link
Member

I am not a specialist, but my understanding is that "text search" typically operates on multiple layers of abstractions over Unicode code units and/or code points, in ways that are quite domain-specific. Typically, both the query and input strings need to be normalized (language-specific handling of accentuated characters, punctuation removal, etc.) and are subject to further heuristic interpretation (conjugation, synonyms, logical combinators, etc.)
I do not have a clear picture of how Uniode BiDi markers affect these processing steps. I would also be interested to know how hard/easy it is to edit such RTL markers into strings in the first place (i.e. in authored metadata properties, and in user-provided search / form input fields).

@laudrain
Copy link

For language direction, this one seems ok:

{
"@context":"http://schema.org",
"@type":"Book",
"author": {
		"@type":"Person",
	        "name": "Haruki Murakami",
		"alternateName": "\u2067村上 春樹"
        }
}

but lack the language tag.

@iherman
Copy link
Member Author

iherman commented Jun 14, 2018 via email

@llemeurfr
Copy link
Contributor

I may be offbeat, but feel that using some alternateName for xlang properties is an issue. Why would one language be a primary one and other subsidiary, as 'alternate' suggests in practice?

Also, For property values in one single language (i DON'T speak about strings using a mix of LTR and RTL), don't you think that the language attribute is enough for what UA have to do, i.e. filter the proper variant and display the value?

@iherman
Copy link
Member Author

iherman commented Jun 14, 2018

@llemeurfr we have not addressed this alternateName issue at all so far, this is only to explore the I18N issues...

Also, For property values in one single language (i DON'T speak about strings using a mix of LTR and RTL), don't you think that the language attribute is enough for what UA have to do, i.e. filter the proper variant and display the value?

Depends what expect from the UA for alternate names which, again, we have not discussed so far. But I believe you are right on a more general level: having the language information available is a necessity.

@llemeurfr
Copy link
Contributor

@iherman, yes, my main question here is: what would be the practical use of a direction attribute for property values that are in a single language? I think the answer is none.
If this is the case, we should make it easy to express a property expressed in multiple languages, each value being in a single language, i.e repeatable property with a language attribute, i.e what is possible today with JSON-LD. This may correspond to 99% of the use cases.

After this is settled, if we find a way to express values containing a mix of LTR & RTL using Unicode bidi characters or any other markup, fine.

@iherman
Copy link
Member Author

iherman commented Jun 14, 2018

@iherman, yes, my main question here is: what would be the practical use of a direction attribute for property values that are in a single language? I think the answer is none.

I believe that is correct.

If this is the case, we should make it easy to express a property expressed in multiple languages, each value being in a single language, i.e repeatable property with a language attribute, i.e what is possible today with JSON-LD. This may correspond to 99% of the use cases.

Again, that is correct. See #219 .

After this is settled, if we find a way to express values containing a mix of LTR & RTL using Unicode bidi characters or any other markup, fine.

Correct again. We can indeed put forward a resolution whereby we rely on the Unicode bidi characters like the Activity Stream Recommendation does: the advantage is that we can adopt it right away and we do not hit any obstacle with JSON-LD and/or Schema.org. The disadvantage is that it is a bit complex to author the metadata...

There are some people in the group who may have some experience with authoring mixed setups; would be good to hear whether that approach could work...

@r12a r12a added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Jul 6, 2018
@TzviyaSiegman
Copy link
Contributor

Looks good. I think we need to provide some explanation around "und". We could take that from Activity Streams too.

@llemeurfr
Copy link
Contributor

I'm not fan of the names proposed for these properties ( "defaultTextDirection" and "defaultTextLanguage") but this is a bikeshedding detail that can be treated later.
Apart from that detail the proposal is good.

@mattgarrish
Copy link
Member

Not to bikeshed, but for a bit of brevity could we just use textDirection and textLanguage? Default-iness can be determined from the description.

Otherwise, looks fine to me.

@iherman
Copy link
Member Author

iherman commented Jul 6, 2018

I'm certainly not bound to those names. textDirection and textLanguage is fine with me.

@iherman
Copy link
Member Author

iherman commented Jul 7, 2018

Unfortunately, I realized that I have fallen into a trap, and the proposed solution for the default direction is not really clean:-( The problem is with the semantics of what JSON-LD/Schema.org really expresses.

In general, when we have, in the manifest, something like

"id" : "http://www.the.book.id",
"author": "John Doe"

What that means, in English, is that

The author of the publication, whose identifier is http://www.the.book.id, is "John Doe"

Ie, every statement is something we say about the publication with the identifier (or address). However, when we have a statement like "defaultLanguage:":"fr", what we want to express is not that the default language of the publication is French, but that the default language of the "metadata" about the publication is French. This is the reason that, in the current draft (not in my proposal) we used the extension of the @context to express the default language for the manifest statements.

Expressing all this properly, though possible, would involve other notions in JSON-LD (i.e., Datasets) that are (a) probably too complicated for most of our users/readers and (b) probably not understood by the schema.org processors. We should not go down that route, imho.

Sigh.

I can see two approaches:

  1. We move back to the @context extension using "@language":"fr" for the default language (i.e., we keep what is in the current draft), and we accept that there is no simple way to express the global, default direction and, therefore, we remove that notion from our infoset. Bidirectional texts are solely expressed by their UTF encoding. (After all, EPUB3 cannot express this either, and it did not seem like a major drawback, although we simply may not have heard of Hebrew, Arabic, Farsi, etc, publishers.)
  2. We accept that there is a semantic impurity, but we keep to the proposal and we consider it as pragmatic solution. The terms put forward are indeed not schema.org but our "own", and we can try to provide a decent rationale in their formal definition in the vocabulary itself. And we move on:-)

Under the adage that usability and authors'/users' interest has a higher priority than theoretical purity, I am mildly in favor of (2) above. But if we do that, we have to realize what is happening, ie, that we are cheating...

(My apologies not to have realized this when I made the proposal.)

Cc @TzviyaSiegman @mattgarrish @llemeurfr

@mattgarrish
Copy link
Member

After all, EPUB3 cannot express this either, and it did not seem like a major drawback

EPUB does allow the default directionality to be specified through the dir attribute on the package element. You can also override it on each text-carrying element.

The problem with minting stuff ourselves is that we'll be stuck supporting it for as long as the format exists. It might be useful to add our own solution and highlight it as an issue we need feedback on in the next working draft.

@iherman
Copy link
Member Author

iherman commented Jul 8, 2018

@mattgarrish that is fine. Unless there are major objections you should the add a note to the draft (maybe also referring to the problems outlined above) and merge to the main branch...

(I’m on vacations fir 10 days, I won’t do it now...)

@llemeurfr
Copy link
Contributor

Reading the https://w3c.github.io/wpub/#language-and-dir section with fresh eyes, I feel that we'll face a huge misunderstanding of what these 2 properties are for, from implementers.
On language because what publishers want to express is mostly the "language of the book" (dc:language)/
On base direction because most will confuse this with the page progression direction.

So I would rather suppress the whole section and state that the language of the metadata will be inferred from the language of the book itself (i.e. the content), unless specified on the metadata value itself. This is short and pragmatic (the border between content and metadata is thin).

And we must acknowledge that there is no perfect solution today on the Web (and in JSON-LD) for expressing the base direction of metadata values in edge cases, therefore we'll stick with https://w3c.github.io/string-meta/ recommendations and JSON-LD specification.

@iherman
Copy link
Member Author

iherman commented Jul 18, 2018

@llemeurfr, I just want first to have a clear understanding of what you propose. Is it so that:

  1. The language setting and base directions of the "primary entry page" (if any) also provides the (default) language and directionality for all the textual manifest items. In other words, if the manifest is embedded in the entry page, it "inherits" those values
  2. There is no way to set the default language/base direction for the manifest textual items in the manifest itself.
  3. Using the JSON-LD facilities (hoping that, eventually, schema.org processors will accept these) it is possible to set a specific language for an individual textual manifest item.
  4. It is not possible to set the directionality of an individual textual manifest item except through the UTF direction marks.

Provided this is indeed what you propose, my 2 cents:

  • First of all, I can live with (1) (even if we maintain the rest of what we have); we already "inherit" the title from the enclosing primary entry page, ie, one could also be fine with the language. Although, we have to realize that it is not crystal clear semantically: just like the <meta> elements in the enclosing HTML file provide metadata for the enclosing document only (and not for a collection), and hence it is semantically erroneous to mix these two, the same holds (in my view) for language tags. We may have to transgress purity in favor of author's ease...
  • For (2): I simply do not know how often one has a situation whereby the language of the publication and the language of the manifest would differ. The message I got early on in the WG discussions was that these two may be different, and hence it is important to separate these. If that is not the case (this is the decision of the group, obviously) then, of course, the combination of (1) and (2) is fine with me.
  • All that being said, putting aside the deficiencies of schema.org processors, setting the language in the context for all textual values is a legal and existing JSON-LD facility. Does it mean that we would have to explicitly disallow its usage?
  • I presume we are in agreement on (3) and (4).

@iherman
Copy link
Member Author

iherman commented Jul 19, 2018

@llemeurfr is it o.k. if I prepare a separate draft (not necessarily a PR yet) that is based on the idea that the language/dir is inherited from the primary entry page, and we can then look at that? Thinking about it further since yesterday this may be a much better option indeed, with the least of the semantic issues...

If you are fine, I can try to do this before our call on Monday.

Cc @mattgarrish

@llemeurfr
Copy link
Contributor

@iherman this is not what I have in mind. I'll try to express it in a clearer manner:

  • https://w3c.github.io/wpub/#properties-intro states that Descriptive properties describe aspects of a Web Publication, such as its title, creator, and language.
  • the section corresponding to this language metadata is 4.4.5, which describes the language of "Each textual property in the Web Publication's infoset", which is inconsistent with the introduction.
  • So, let's rename 4.4.5 "Language" and define here "a language of the publication". This is what publishers will expect.
  • Let's add in this section that the default language of the textual properties associated to the publication can be inferred from this value (which is intuitive, and is the reverse of point 2 in your list) and that individual textual properties can override this default language by expressing a value using the JSON-LD format (this is point 3 in your list).
  • And add your point 4 about the directionality of an individual textual manifest item.

nb: I would be against point 1 in your list, the inference is too remote.

@mattgarrish
Copy link
Member

which describes the language of "Each textual property in the Web Publication's infoset", which is inconsistent with the introduction.

We never did resolve that issue - how epub uses dc:language for the publication and xml:lang for the package metadata values.

If we require that the first language code listed be the default language of the publication and manifest values (i.e., the property is either a single value or an array of values), then it probably makes as much sense as any other approach for now.

@iherman
Copy link
Member Author

iherman commented Jul 19, 2018

@llemeurfr that is indeed radically different, just as I got to like 'inheriting' the language/dir settings from the HTML level...:-)

However... I see a serious problem with what you propose. You give a primary role in setting the language for the manifest. However, that information will be invisible to vanilla (ie, not WP aware) browsers. This means that the language for the real (HTML) content will be considered as "und" unless the language is set on an HTML element as well. A source of redundancy. And then, of course, we may have an issue if the two are in conflict: english is set in the manifest and french in the content. What happens then?

Unfortunately, for me, that is a serious flaw and I would not be in favour of that approach...

I would actually argue for what I thought you had convinced me about:-): The case of the embedded manifest is particularly attractive: the language and direction is set on the, say, <html> element and the manifest automatically inherits it (unless it is explicitly set otherwise). Actually, one can also use the HTML facilities, and use <script lang="fr"> as well or, even, <script lang="ar" dir="rtl">, which would be a way to set the default language and direction for the manifest. It sounds fairly clean to me.

It is indeed a bit more 'distant' in the case of a separate manifest file but, there again, we could say that the language and dir on the <link> element (which, per HTML, is inherited from its ancestors unless explicitly set) is the one valid for the manifest (again, unless the manifest changes it explicitly). It is a bit less clean than the embedded case, but works.

In both cases the advantage is that a vanilla browser understands the language setting from the HTML, ie, there will be no possible discrepancy in the rendering. That is a major plus. (And is better than the current draft, actually!)

@llemeurfr
Copy link
Contributor

@iherman I consider it required to set the language on each HTML resource individually, as it is the practice on the Web. Voice engine and other tools will make good use of it.
I don't see any issue if the two (publication metadata in the manifest and resource level information) are in conflict, as they will be used by different tools. **it happens, but it won't break the experience of the user.

@mattgarrish I agree that it should be the first language value, as Jiminy advocated in its internationalization paper.

@mattgarrish
Copy link
Member

I consider it required to set the language on each HTML resource individually

Yes, the language specified in the manifest is not used to set the language of the resources, just as it isn't in EPUB. It's there to provide context. The usual examples are to preload tts languages, offer to download dictionaries, etc.

@llemeurfr
Copy link
Contributor

the language specified in the manifest is not used to set the language of the resources; it's there to provide context. The usual examples are to preload tts languages, offer to download dictionaries, etc.

... a very good editorial note to add to the spec of this language property.

@iherman
Copy link
Member Author

iherman commented Jul 19, 2018

the language specified in the manifest is not used to set the language of the resources; it's there to provide context. The usual examples are to preload tts languages, offer to download dictionaries, etc.

... a very good editorial note to add to the spec of this language property.

But isn't against what you propose, @llemeurfr ? The language specified in the manifest is, in your proposal, considered to be the language of the content, too. Ie, it does (much) more than setting the text in the context...

Even if we consider the possible conflict as a negligible issue I think that we would introduce a source of further confusion. And, per @mattgarrish

Yes, the language specified in the manifest is not used to set the language of the resources, just as it isn't in EPUB.

ie, what you propose would be the contrary of what EPUB does...

@llemeurfr
Copy link
Contributor

@iherman no, it's "a language of the publication" and by inference also the default language of descriptive metatada if in first position in a list. If there is only one publication language and its not what the UA finds when getting the language of html resource, there is an editorial discrepency. But so what?

@iherman
Copy link
Member Author

iherman commented Jul 20, 2018

@llemeurfr,

I am trying to see what you propose (putting aside how this should be edited into the document).

  1. We use the schema.org inLanguage term as defined. This means it defines the language of the publication (which is the "subject" of the manifest's statements), and we consider this as the (default) language of manifest's textual terms as well.
  2. Using the JSON-LD facilities (hoping that, eventually, schema.org processors will accept these) it is possible to set a specific language for an individual textual manifest item.
  3. It is not possible to set the directionality (neither globally or locally) of textual manifest item except through the UTF direction marks.

An alternative to (3) is that we do introduce our own term for direction for setting the global base textual direction for the publication (going in pair with inLanguage and hoping that, at some point, this will become a bona fide schema.org term). Which means that it does become impossible to set the directionality of individual text item, but at least we have something as a global value.

Does this reflect your proposal? If so, we do have two fairly distinct proposals to (finally) close this issue: this one, and the one I described in #220 (comment)

@llemeurfr
Copy link
Contributor

@iherman, items 1,2 and 3 reflect my position, yes (thank you for pointing at inLanguage).

Re. the alternative to 3 you're proposing, my issue is that I don't know what a direction property would be used for. Not for categorizing publications, not for displaying property values ...

@iherman
Copy link
Member Author

iherman commented Jul 20, 2018

@llemeurfr

It is the same as the dir attribute in HTML. User agents may choose to put the table of content popup on the right of the screen instead of the left as customary.

@JayPanoz
Copy link

... a very good editorial note to add to the spec of this language property.

Indeed, because in EPUB-land, some people assume that you only have to set the one in the manifest and you’re good to go. And resources are then missing xml:lang or lang, and some reading systems then use the manifest’s as a fallback and append the attributes because TTS but also default fonts, hyphenation, some CSS props like text-transform, how to break lines, etc. all depend on the language of the resource…

@iherman
Copy link
Member Author

iherman commented Jul 22, 2018

Actually, @llemeurfr (and others): there may be a discrepancy between the cases when the JSON-LD is embedded via a <script> and when it is a separate resource.

Indeed, when it is an embedded resource, there are some general questions on what the JSON-LD "inherits" from its surroundings. I raised the issue a while ago in the JSON-LD WG on what the base URL is for embedded JSON-LD (which is a question of relevance for WP manifest, too) and, though it seems logical that the document URL is the one, this is not strictly defined in JSON-LD 1.0 (hopefully it will be for JSON-LD 1.1, taking into account that embedded JSON-LD is the format understood by schema.org processors). "Inheriting" the default language would fall in the same category. In other words, in the case of an embedded URL the "inheritance" from the primary entry page seems to be the natural move.

Could there be a small difference between the two? ie,

  • for embedded JSON-LD the language (and direction) setting valid for the <script> element is inherited by the JSON-LD content, and is also considered as the language for the publication
  • for a separate JSON-LD the language must be set explicitly as described in Setting the (text) direction #220 (comment)

WDYT?

@llemeurfr
Copy link
Contributor

There is a danger if the behavior of a "detached" manifest is different from the behavior of an "embedded" manifest. A manifest should be attachable/detachable with no modifications.

@css-meeting-bot
Copy link
Member

The Working Group just discussed publishing new draft.

The full IRC log of that discussion <dauwhe> Topic: publishing new draft
<dauwhe> ... we have a few open issues
<tzviya> https://github.com//issues/261
<dauwhe> Github: https://github.com//issues/261
<dauwhe> ... this is cover vs cover-image
<dauwhe> ... look at last comment from Matt
<dauwhe> ... we concerned about the infoset
<dkaplan3> q+
<dauwhe> ... Matt says we should be concerned with language
<dauwhe> ... so we're just discussing changing language
<tzviya> ack dkaplan3
<dauwhe> ... should we say cover or cover image or cover page
<tzviya> ack dkaplan3
<dauwhe> dkaplan3: the one thing that has happened in github
<ivan> zakim, who is here?
<Zakim> Present: dauwhe, ivan, tzviya, wolfgang, Juan_Corona, jbuehler, Avneesh, JuanCorona, wendyreid, dkaplan, laudrain, JunGamo, Hadrien, makoto, jpyle, josh, gpellegrino, George,
<Zakim> ... BenWaltersMS, Franco, caitlingebhard, laurentlemeur, duga, marisa
<Zakim> On IRC I see marisa, derekjackson, ReinaldoFerraz, lsullam, rkwright, duga, laurentlemeur, Franco, caitlingebhard, BenWaltersMS, Makoto, josh, cmaden2, Hadrien, JunGamo, EvanOwens,
<Zakim> ... laudrain, wendyreid, JuanCorona, jbuehler, George, Karen, dkaplan3, Avneesh, RRSAgent, Zakim, ivan, wolfgang, dauwhe, tzviya, plinss, Rachel, github-bot, astearns, bigbluehat,
<dauwhe> ... the people who wanted a discrete cover page
<Zakim> ... jyasskin
<dauwhe> ... I think the people in github would be fine with cover image
<dauwhe> ... when I gave the whole "here are some guidelines" thing
<dauwhe> ... I think people bring up stuff that doesn't need to be in the infoset
<harriett> +
<dauwhe> ... it's fine to document these extra things
<tzviya> q?
<tzviya> ack dkaplan
<dauwhe> ... so we should go back to the github issue later
<dauwhe> ... I think my comment addressed everything except for the infoset Q about a cover that is not a cover image
<dauwhe> tzviya: perhaps we can open a new issue
<dauwhe> ... the proposal that you had, can you sum it up?
<dauwhe> dkaplan3: my proposal for infoset purposes
<dauwhe> ... I was going based on the assumption that because
<dauwhe> ... Ivan reminded us that at the F2F there needed to be the idea of a cover, that might not be image
<dauwhe> ... I don't think we need both cover and cover-image
<josh> q+
<dauwhe> ... but if people feel strongly about a cover that is not an image that still needs to be in the infoset
<dauwhe> ... the reason people want cover images in infoset is for shelf view, etc
<dauwhe> ... that reasoning doesn't apply to a cover
<laurentlemeur> q+
<tzviya> ack josh
<dauwhe> ... will anyone go to bat for needing a non-image cover IN THE INFOSET
<dkaplan3> q+
<dauwhe> josh: I would make a strong case for a cover that's not an image because not all content includes imagery
<dauwhe> ... just point to something, and if it's an image then great, if not they could render the html
<dkaplan3> Josh: see https://github.com//issues/261#issuecomment-406696836
<dauwhe> ... for scholarly articles, the cover would be title/author/ journal / issue
<dkaplan3> This comment specs out all of that.
<dauwhe> tzviya: that's already been mentioned in an issue
<dauwhe> josh: but there are 70 comments
<tzviya> ack laurentlemeur
<dauwhe> ... I don't think we should have both a cover and cover image
<dauwhe> laurentlemeur: we should close issue by saying we define cover-image
<dauwhe> ... discuss elsewhere if we need another type of cover
<dauwhe> ... user agents could assemble image from metadata, wouldn't need html
<dauwhe> tzviya: josh proposed just cover
<dauwhe> ... the publisher can include image OR text in html
<dauwhe> ... the user agent would do some magic to display
<tzviya> ack dkaplan
<dauwhe> laurentlemeur: I think the magic to display HTML is more than magic to assemble from metadata
<dauwhe> dkaplan3: I put a link to my github comment
<dauwhe> ... for later, when we are writing recs for what UAs should do
<dauwhe> ... we will need to have guidelines for what to do when you don't have a cover
<dauwhe> ... I'm happy with not having both
<dauwhe> ... the diff between Laurent and Josh
<dauwhe> ... in the absence of an image, do we recommend the UA wants to extract metadata and make cover?
<dauwhe> ... or do we think UAs should tried to define a text cover somehow
<dauwhe> ... I would go with Laurent
<dauwhe> ... it's a standard practice now that you get title/creator in shelf view if there's no cover
<wendyreid> q+
<dauwhe> ... if your business case is that it's important to have specific information on the cover, then you should probably actually create an image
<tzviya> ack wendyreid
<dauwhe> wendyreid: from experience with Kobo, that's what we do
<dauwhe> ... if we have image we use it
<ivan> q+
<dauwhe> ... if there's no image file, then we create cover with metadata
<dauwhe> ... that's very standard
<tzviya> ack ivan
<dauwhe> ivan: I don't understand the controversy
<dauwhe> ... I thought Josh's proposal was fine
<dauwhe> ... there's a cover, if you put image there you get image, if HTML is there you render that
<dauwhe> ... and in the scholarly world, title and author might not be enough
<dauwhe> ... you need standard metadata
<josh> +1 to Ivan expressing my business case better than I did.
<laurentlemeur> q+
<dauwhe> ... I don't understand the problem
<tzviya> ack laurentlemeur
<dauwhe> tzviya: a reminder that we're only talking about the infoset
<dauwhe> laurentlemeur: if we follow josh, it means every UA will have to be able to take an arbitrary HTML file or something else and try to make a cover out of it
<josh> q+
<dauwhe> ... this puts a burden on user agents
<tzviya> ack josh
<dauwhe> josh: I think that UAs should do what they think is best
<dauwhe> ... using this approach, you have something called a cover that points to image or file
<dauwhe> ... if UA doesn't know how to turn html into shelf-view icon, it can still use metadata
<dauwhe> ... we should provide as much guidance as possible to UA, then let UA choose
<George> q+
<tzviya> ack George
<dauwhe> tzviya: we might need to decide to publish without this
<dauwhe> George: just an image is too limiting in terms of looking at the future
<dauwhe> ... I see discussions about VR and innovation in the book space
<dauwhe> tzviya: [1] include cover which could be anything or [2] just a cover image ?
<laurentlemeur> 2
<dkaplan3> 2
<JuanCorona> 1
<Hadrien> 2
<ivan> 1
<tzviya> 1
<caitlingebhard> 1
<josh> 1
<wendyreid> 1
<derekjackson> 1
<wolfgang> 1
<Franco> 1
<rkwright> 2
<gpellegrino> 2
<laudrain> 2
<George> 1 cover
<lsullam> 1
<clapierre> 1
<marisa> 1
<jbuehler> 1
<MustlazMS> 1
<rkwright> 1
<dauwhe> tzviya: I see more 1s than 2s
<ivan> q+
<rkwright> I inadvertently entered a 2.
<tzviya> ack ivan
<dauwhe> ivan: I would propose to put there the more permissive approach 1, publish a draft (which isn't final)
<dauwhe> ... and see what the community has to say
<dauwhe> ... we don't have unanimity
<dauwhe> ... this is just a draft
<dauwhe> ... easier to restrict early
<tzviya> https://github.com//issues/220
<dauwhe> github: end topic
<dauwhe> github: https://github.com//issues/220
<dauwhe> ivan: we had quesiton of direction ltr rtl
<dauwhe> ... trouble expressing in JSON
<dauwhe> ... consensus in discussion that rtl/ltr we don't have other means than fallback on unicode directional markers
<dauwhe> ... and no explicit default direction setting
<dauwhe> ... i think laurentlemeur we have agreement
<dauwhe> laurentlemeur: yes
<dauwhe> ivan: a more general issue came up
<dauwhe> ... related to the language setting
<dauwhe> ... there are 2 different things
<dauwhe> ... 1. if I set language in manifest in one of schema.org terms inLanguage
<dauwhe> ... this means I set language for publication at large + text of manifest
<dauwhe> ... individual resournces may not set their own languages, there may be discrepencies
<dauwhe> ... 2 the other appraoch is more complicated
<dauwhe> ... when the manifest is embedded in html
<tzviya> q+
<dauwhe> ... looking at that case it would be logical that the script element with manifest inherits language and dir of entry page
<wolfgang> s/appraoch/approach/
<dauwhe> ... so if it's part of html then I could and maybe should refer to what HTML does
<dauwhe> ... we still say that if you do it that way you're talking about the publication as a whole
<dauwhe> ... when we have an embedded manifest, do we inherit the HTML settings?
<dauwhe> ... or not?
<dauwhe> ... an additional argument... this is one thing from HTML structure that we will inherit
<dauwhe> ... this is the base URL
<tzviya> ack tzviya
<laurentlemeur> q+
<dauwhe> tzviya: I put myself on the queue
<dauwhe> ... the schema.org group is aware they have language issues, but they're trying to work it out
<dauwhe> ... they know language on particular tags is a problem
<dauwhe> ivan: yes, the setting of a langague for an individual text is already there
<dauwhe> ... we hope schema.org handles it eventually
<wolfgang> s/langague/language/
<dauwhe> ... the json-ld working group, partly on my instigation, is looking at issues of embedded json-ld
<dauwhe> ... it was non-normative in 1.0
<dauwhe> ... for example, there's no resolution on inheriting baseURL
<tzviya> ack laurentlemeur
<dauwhe> ... I think that will be resolved in 1.1
<dauwhe> laurentlemeur: in fact, here we are trying to do 2 things
<dauwhe> ... the language of publication is descriptive metadata
<dauwhe> ... if we want to infer the language of manifest, that's simple
<dauwhe> ... we can do that in json-ld with @language in context
<dauwhe> ... so we are trying to simplify work of authors
<dauwhe> ... by inferring from publication language
<dauwhe> ... and maybe we should inherit from html if manifest is embedded
<dauwhe> ... but that makes processing of detached and embedded manifests different
<dauwhe> ... this is why we should infer language from publication language
<dauwhe> tzviya: what q are we answering?
<dauwhe> ivan: the current text needs to be rewritten
<dauwhe> ... at least for embedded version there are two ways of rewriting
<dauwhe> ... if you want detachable things then use @language
<dauwhe> ... we have two difficulties, i agree with this one
<dauwhe> ... if we ignore surrounding html we will end up defining something which is not aligned with how json-ld is used in html
<dauwhe> ... I don't know which one is a bigger danger
<dauwhe> tzviya: the Q is, whether we use something we are sure will work detached or embedded, but might overwrite default/be in conflict with processors
<dauwhe> ivan: I dont think this is correct
<dauwhe> .... if you use @language it works everywhere
<dauwhe> ... if I don't put anything in JSON_LD or any other thing, what happens then?
<dauwhe> ... in one case a language might be inherited, in the other case not
<laurentlemeur> q+
<tzviya> ack laurentlemeur
<dauwhe> ivan: @language is so far away from the standard syntax for authors; extending a context is very difficult to follow
<dauwhe> laurentlemeur: the context line in json-manifest should be copy/paste; should not be edited
<dauwhe> ... it's not about metadata, not about structure
<dauwhe> tzviya: can we come to consensu?
<dauwhe> ivan: we should do a PR knowing there are issues, and we don't know
<dauwhe> ... I can try to write up the more complex situation and see where it goes
<dauwhe> laurentlemeur: let's try to write it
<dauwhe> ivan: I will come up with a PR, hopefully this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic:internationalization topic:manifest topic:metadata topic:schema mapping
Projects
None yet
Development

No branches or pull requests

10 participants