Skip to content
This repository has been archived by the owner on Apr 26, 2022. It is now read-only.

Do RS use epub:type structural vocabulary to know where the main content ends? #90

Open
laudrain opened this issue Jul 12, 2019 · 9 comments
Labels

Comments

@laudrain
Copy link
Collaborator

laudrain commented Jul 12, 2019

Reading Systems gather knowledge from the reader behavior. They know if the book has been read entirely when the very last xHTML document on the spine has been thoroughly paginated to the very last word.
But more and more EPUB files have at their end marketing documents (author interview, teaser for next volume, ect).

So if the reader doesn't read these last marketing content documents, the book is never reported as finished by the Reading System. And the fact that the book that was not finished by the reader may well be a sign that the reader did not find it worth the read to the end...

Thank to epub:type, publishers use the structural semantic vocabulary to mark where the main content ends. When the reader comes at the end of the last content document with epub:type="bodymatter", then the book is finished !

Do Reading System use this markup ?

@TzviyaSiegman
Copy link

Even if RSes use epub:type to assess which HTML file is the last one, this is usually applied at the beginning of a document. If a system is attempting to determine whether a user has finished a book (e.g. completed the last chapter but not read the index, which might have epub::type="backmatter"), will it be able to assess whether the reader has started the last chapter but not finished it?

Further there are almost no rules around the application of SSV, so I don't think that this is a reliable method.

@laudrain
Copy link
Collaborator Author

Structural Semantic Vocabulary is a rule in itself.
And structure is well known from publishing processes.
As "bodymatter" is a value that can be set on tag, the closing tag shows exactly where the content ends.

@mattgarrish
Copy link
Member

Couldn't you identify the beginning of the backmatter in the landmarks, same as where the bodymatter starts?

@laudrain
Copy link
Collaborator Author

They are !
These last content documents beyond the main content are identified with epub:type="backmatter" in the tag.
In Hachette Livre, all our EPUB3 files have a mandatory epub;type on each xHTML body tag.
These epub:type values on must follow the logical structural order : cover, frontmatter, bodymatter, backmatter.

@dauwhe
Copy link
Contributor

dauwhe commented Jul 12, 2019

How reading systems decide if a book has been "finished" is far, far outside the scope of the EPUB 3 specification. There is no interop problem. But this is an example where using the SSV can provide a very useful piece of information to a reading system, should it choose to use it. And best of all, this information exists in quite a few existing EPUBs.

I'm wary of suggesting a landmark, just because not much existing content has a useful landmark.

This is really up to reading systems. What if I open a book, follow a landmark to the first backmatter section, and then go backwards by one page? I've technically reached the end of the bodymatter without reading 1% of the book. Of course many books won't have even this information.

@TzviyaSiegman
Copy link

I think we also need to think about whether backmatter is part of the content or not. What about an appendix? Many books have appendices that are really content. Should that be "bodymatter" or "backmatter"?

@mattgarrish
Copy link
Member

What if I open a book, follow a landmark to the first backmatter section, and then go backwards by one page?

Couldn't you do the same thing by following the toc to the backmatter and going back one page, whatever method is used, though?

I'm just thinking it would be simpler to have one landmark that identifies the spine item where the backmatter begins and let the reading system worry about tracking the rest than expect authors to mark every content document with semantics.

Isn't the bodymatter landmark generally used for locating the first page to begin reading? I thought that was the one that actually got some uptake.

But I agree this doesn't seem like territory for the specification.

@wareid
Copy link

wareid commented Jul 12, 2019

This definitely falls into "what should RSs do with what we have". I think the spec covers this case well, "backmatter" is pretty unambiguous. I would say as a reading system if I know that there is a backmatter section, I definitely wouldn't want to mark the item as finished without user input. Maybe they do want to read the index (maybe it's a good index!). But currently we don't get this data consistently, if we addressed this as a matter for landmarks or just in the nav doc, I think we could poll RSs into what they would like to see and push it as a best practice.

@arhomberg
Copy link

well, as somebody who has spent half a lifetime looking at user's reading behavior, I can say that how people read books and what constitutes "finished" is a great deal more complicated than "final chapter has been paginated" (see for example what happened to Amazon' KU when it took "last page synced as a signal...); having semantic landmarks for start and end of body matter (in a novel or a book with narrative structure) is indeed extremely useful in all sorts of scenarios other than "finished book" and how the reading system figures out if the reader actually finished that book uses a lot more inputs than "last chapter paginated" - for non-fiction books that don;t have a narrative structure the situation si alltogether different and the landmarks may make no sense...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants