Blacklist without subtree #19

marcuswu · 2014-11-04T15:54:53Z

There are some use cases where applyBlacklist does not work. applyBlacklist removes the entire subtree starting from the element that is blacklisted. If the injected element surrounds original content (for instance in the case of a span tag to highlight text in a spine item), any content surrounded by the span tag is no longer CFI addressable which would also lead to calculation of CFIs within the wrapped content to be incorrect.

I have altered applyBlacklist to instead only ignore those elements directly marked for the blacklist. Any children of the element not marked for blacklist are logically promoted to the spot of the blacklisted parent.

This means that all elements that should be blacklisted must be directly marked (subtrees are no longer automatically filtered out), but it also means one can wrap content in tags and it will not affect addressing those elements via CFI.

I think this is necessary for highlighting and notes functionality.

I had these committed in our own fork for some time and I noticed that those commits also have some other random fixes integrated in them. Sorry, but I didn't see an easy way to tease those out. If desired, I can squash these into one commit.

Since this departs from existing functionality, I fully expect some conversation surrounding these changes and I welcome it.

* Handle node injection after the all text in a text node * Apply blacklist to a node, not its subtree * Fix a couple of other bugs

danielweck · 2014-11-04T16:47:51Z

Thanks @marcuswu
I believe the current technique works, as it consists in visually overlaying a positionally-fixed semi-transparent coloured area (div element(s)) on top of the range of characters for the selected text highlight. As a result, the document is in fact not polluted by additional "inline" markup, as instead the overlay itself is blacklisted (entire HTML fragment(s) for each highlighting div, of which the DOM subtree can be entirely skipped / ignored by CFI processing), and inserted at the end of the body (non-disruptive DOM location with respect to CSS selectors matching, CFI resolving).

Update: as @jccr reminded me, "marker" elements with empty content are also inserted, which is non-disruptive to CFI processing (whole subtree can be safely ignored/skipped). There is the potential edge-case of broken CSS selectors (in cases whereby the "sibling" relationship is used), but that is not a big concern due to the rare probability of such problem occurring, or having a significant impact on document rendering (same remarks applies to the last sentence at the bottom of this comment).

Code references:
https://github.com/readium/readium-shared-js/blob/develop/lib/annotations_module.js
https://github.com/readium/readium-shared-js/blob/develop/js/views/annotations_manager.js

That being said, I like the idea of inserted markup being truly "transparent", in the structural sense. Besides the bookmarks / annotations use-case, this could be useful (for example) when wrapping images/videos in an extra div to achieve some kind of rendering / interaction effect.

More importantly, this would eliminate some existing caveats with the "overlay" technique: semi-opacity is a suboptimal way to create a background colour for the text selection, as it produces undesirable (uncontrollable) font colour. Furthermore, the overlay div(s) need to be visually relocated (CSS fixed positioning) for example when the font size changes, or when the viewport is resized.

One possible caveat for "real" inserted markup is that this may break authored CSS selectors that have strict structural dependencies (i.e. it should work fine in most cases such as class names, but ">" would break).

danielweck · 2014-11-04T16:58:30Z

For the sake of completeness, I should also point out that in the context of EPUB3 Media Overlays (talking / readaloud books with synchronised text / audio) the "highlighter" implements experimental support for CFI character ranges (i.e. the SMIL does not need to reference the IDs of target HTML elements).

This functionality relies on a library called "Rangy", which provides a utility that inserts necessary span element around a text selection, using a given CSS class name. This process temporarily produces CFI-breaking markup within the HTML document (i.e. for the duration of the current audio/text playback), so in theory the bookmark / annotation tool should not be invoked at that point in time. In practice, this is an extremely rare edge-case as there are currently no Media Overlays ebooks that use this feature, and there is not even a clear IDPF specification for it. My general suggestion would be to disable the bookmark / annotation commands during MO playback.

In a future version of Readium, "audio bookmarking" would be a desirable feature anyway, and would require an alternative framework (to preserve not only the currently-playing SMIL fragment, but also the time offset within the associated audio clip ... as you can see, this would be irrespective of the currently-highlighted DOM element / character range).

Code references:

https://github.com/readium/readium-shared-js/blob/develop/js/views/media_overlay_element_highlighter.js#L319

https://github.com/readium/readium-shared-js/tree/develop/lib/rangy

marcuswu · 2014-11-12T15:51:00Z

At BiblioLabs we have been using a different approach to highlighting. We had issues with properly calculating positioning for overlays in certain conditions. They also do not provide the ability to perform text selections on the content while displayed (for instance, selecting text for creating an annotation / note while book text search results are highlighted).

The tests fail due to the difference in how blacklisting works with the patch applied. I can look into updated tests if the approach at blacklisting is desired.

dannyvenier · 2014-11-12T16:30:27Z

Hi Marcus,

I'm a web developer and I'm working on an e-Reader project. I saw your comments on the Readium/idpf forum on CFI Element blacklisting and wanted to comment. Unfortunately, I can't seem to get a Readium forum account to comment so I went here.

We have been undertaking to implement a highlight feature and can infer from the public APIs on readium that the "preferred" approach is to use the API to add a highlight, which adds a div element (or group of div elements, one per line) outside of the document body and some zero length span elements at the start and end point of the selected text of the highlight. This approach works okay, but the lack of ability to get a reference to the colored div is causing us lots of technical challenges.

Probably like you, we concluded that a better approach would be to wrap the document's selected text in a span and use css styles on the span to get color, hover treatment etc. Partway down that path we discovered that the insertion of non zero length span elements pushes all the subsequent cfi references in that same parent out even if the span is blacklisted. Like you pointed out, the content within the span does not get blacklisted so if mucks up our cfis. We are persisting highlights to the cloud, not by changing the document dom but by saving the cfi references and annotation properties and reapplying them on spine change events or when the book is reloaded (browser client).

So, like you, the overlay positioning is pretty tricky to deal with when multiple columns are rendered and the browser window is resized. i should say, the readium code positions the highlights as expected, but dynamic events (for instance, sidebar popped out) which changes the pagination will throw them off position and we need to refresh their position. (it doesn't seem like the highlight refresh method is exposed to us for our particular view but it looks implemented.)

There is also another problem with the overlay approach which is causing us grief. That is, a drag selection that drags over an existing highlight makes the browser select all text in the spine since the highlight div overlay is below the bottom of the body containing the content. This precludes us from having a user friendly UI for the situation when we select into another highlight.

All this to get to my point which is that we like the "embed in dom" approach, and can deal with the highlight persistence readily. Plus we have reference to our own spans or divs that hold the highlighted text so we can manage the positioning, styling and events more easily. However, this is a fork in the road, if we deviate from the Readium approach and I would wonder if we would cause ourselves grief in the future. Also, we would need the "blacklist span plus content" capability that you describe otherwise we'd be constantly refreshing the cfi tree to make sure it didn't invalidate our persisted annotation references.

Have you considered the impact of deviating, and what kind of feedback have you gotten from Readium regarding the pull request?

thanks,
Danny Venier
bitHeads Inc.

marcuswu · 2014-11-12T16:58:49Z

We did end up deviating (hence the fact that we have the new blacklist code
to issue the pull request for), and we decided that we don't like having to
maintain those differences. It makes pulling in Readium's latest code
difficult. It's actually that reason why I finally decided to issue a pull
request for the new blacklist code.

If you do end up doing any deviating work on Readium, I recommend that you
fork Readium and create a branch where all of your work gets merged to
(basically a second master branch). This way it becomes easier to pull
Readium updates into the main master branch and rebasing the secondary
master onto those changes.

What we have decided to do is to periodically issue pull requests back to
Readium -- even for things like this where it is a different approach to
something they already have. This way we'll minimize the divergence between
their main line and ours. Ideally it will not diverge at all (we'll see how
this pull request goes).

I have successfully had other pull requests merged for smaller issues --
mainly bug fixes. I usually try to change as little as possible to make my
fixes and avoid code design considerations that might spark debate. Often
times they are merged without any comment, but occasionally they do request
further changes to improve the quality of the overall code in relation to
the fix (something I take as a good sign).

So far the feedback on this particular pull request has been pretty
positive. They maintain that their method works while also recognizing the
positives in the approach we have taken. There are some broken tests due to
the different approach that I'll probably need to fix before the pull is
accepted. I just now asked whether I should take my time doing that. I'll
keep you posted on how that goes.

Thanks,
Marcus Wu
BiblioLabs, LLC

On Wed, Nov 12, 2014 at 11:30 AM, dannyvenier [email protected]
wrote:

Hi Marcus,

I'm a web developer and I'm working on an e-Reader project. I saw your
comments on the Readium/idpf forum on CFI Element blacklisting and wanted
to comment. Unfortunately, I can't seem to get a Readium forum account to
comment so I went here.

We have been undertaking to implement a highlight feature and can infer
from the public APIs on readium that the "preferred" approach is to use the
API to add a highlight, which adds a div element (or group of div elements,
one per line) outside of the document body and some zero length span
elements at the start and end point of the selected text of the highlight.
This approach works okay, but the lack of ability to get a reference to the
colored div is causing us lots of technical challenges.

Probably like you, we concluded that a better approach would be to wrap
the document's selected text in a span and use css styles on the span to
get color, hover treatment etc. Partway down that path we discovered that
the insertion of non zero length span elements pushes all the subsequent
cfi references in that same parent out even if the span is blacklisted.
Like you pointed out, the content within the span does not get blacklisted
so if mucks up our cfis. We are persisting highlights to the cloud, not by
changing the document dom but by saving the cfi references and annotation
properties and reapplying them on spine change events or when the book is
reloaded (browser client).

So, like you, the overlay positioning is pretty tricky to deal with when
multiple columns are rendered and the browser window is resized. i should
say, the readium code positions the highlights as expected, but dynamic
events (for instance, sidebar popped out) which changes the pagination will
throw them off position and we need to refresh their position. (it doesn't
seem like the highlight refresh method is exposed to us for our particular
view but it looks implemented.)

There is also another problem with the overlay approach which is causing
us grief. That is, a drag selection that drags over an existing highlight
makes the browser select all text in the spine since the highlight div
overlay is below the bottom of the body containing the content. This
precludes us from having a user friendly UI for the situation when we
select into another highlight.

All this to get to my point which is that we like the "embed in dom"
approach, and can deal with the highlight persistence readily. Plus we have
reference to our own spans or divs that hold the highlighted text so we can
manage the positioning, styling and events more easily. However, this is a
fork in the road, if we deviate from the Readium approach and I would
wonder if we would cause ourselves grief in the future. Also, we would need
the "blacklist span plus content" capability that you describe otherwise
we'd be constantly refreshing the cfi tree to make sure it didn't
invalidate our persisted annotation references.

Have you considered the impact of deviating, and what kind of feedback
have you gotten from Readium regarding the pull request?

thanks,
Danny Venier
bitHeads Inc.

—
Reply to this email directly or view it on GitHub
#19 (comment).

dmitrym0 · 2014-11-12T19:51:26Z

The original implementation of highlighting inserted markers into the DOM tree that then later were used to calculate the coordinates for highlight divs. The markers were empty spans (no children) and the divs were outside of the document content so that blacklists could be effectively applied.

Modifying spine DOM strikes me as an poor choice since much annotation functionality/highlighting functionality relies on CFIs references that are incorrect (if the DOM has been modified). Blacklisting is certainly a possible solution. Unfortunately we've seen massive performance degradation with blacklists when it comes to larger spine items. This is just traversing the DOM. An annotation implementation where whole DOM subtrees need to be reordered to apply CFIs implies an even bigger performance hit.

Moving forward we should avoid DOM modifications, if possible. That will allow elimination of blacklisting which would improve performance and certainly simplify some things.

marcuswu · 2014-11-12T20:21:37Z

I did notice some performance degradation using the original code until I changed the blacklisting. The original blacklist code iterates through a list of blacklisted classes, elements, and ids for each element that is being checked.

The approach I chose is to instead build a single selector filter from the id and class blacklist items and apply those together using jquery which should fall to the browser's selection capabilities rather than a javascript loop. This is much faster and I have not seen nearly as large performance hits with this approach.

In any case, I do agree with avoiding the necessity for blacklisting as much as possible, but that is a separate issue from having the ability to blacklist and being able to do so in a flexible way that allows content to be wrapped without being eliminated.

dannyvenier · 2014-11-12T20:43:21Z

Modifying the DOM seems to me to be a more pure and intuitive solution to modifying the view of a document versus an overlay. However, I have no background with the challenge of keeping the cfi tree "correct with blacklist exceptions". The fact that you say you've seen massive performance degradation when using lengthy blacklists is worrisome for sure. Still, there seem to be an continuing string of challenges in maintaining highlights and supporting elements for the overlay approach:

the span markers that are used have offset values relative to the spine (it seems). It is problematic to convert those offsets to pixel positions relative to an element outside the document. As I mentioned, when resetting the highlight position, I can use the readium facilities to convert the cfi into a positioned highlight (just using addHightlight) but we also may need to add context menus, note markers or other visual elements into an out-of-document absolute positioned element. The embedded span tags don't help much there without calculating column position, page length etc.because they are in two different coordinate systems.
the overlay div elements block interaction with paragraph text beneath them so any text already highlighted cannot be easily accessed. I've tried setting the z-index of the highlight div to -1 to put it below the text. This allows for non transparent colours but then I can't interact with the annotation because an annotation click is not seen by the handlers.
any changes in pagination within a spine which change the viewport dimensions result in the highlight div positions being invalid. Similarly, for large spine items, a size drag of the viewport could have large performance impacts as all the annotations in the current content need to be moved for each drag increment.

Something that might help item 1 is a public method to convert a cfi position into a document container pixel position. That might exist in rangy, not sure if it was there whether public.

Item 2 is tricky and item 3 would need some testing to determine if performance degradation was substantial for a particular client machine.

dmitrym0 · 2014-11-13T16:54:07Z

Well, CFIs specify a location within a DOM so modifying the DOM implies a subsequent effort to revise the tree so that CFIs resolve correctly. I find the whole concept challenging so my preference has been to try to think of spine DOM as constant. Not to say DOM modification approach won't work; it introduces another layer of complexity that I would prefer to avoid.

We've gotten rid of the span markers completely. We generate selection based on CFI ranges (and vice versa). We definitely have to do some math to convert from the document frame into the "container" frame, but this code is not especially complicated. We have event handlers that are really epub agnostic that notify the parent container when a highlight etc has been clicked.
This is a problem we're still struggling with. It's been on our radar for a while just not a very high priority.
Once the viewport changes, I think it's reasonable to invalidate existing highlights and reposition/recreate them as appropriate. Not immediately obvious how this can be optimized away since highlighting is a "custom" operation. We do have some optimizations that ensures only relevant content is highlighted so the performance hit during reflow shouldn't be too big.

dannyvenier · 2014-11-14T02:33:24Z

I definitely respect the depth of knowledge that went in to designing the annotation approach the way it was designed. We are reluctant to diverge from the intended approach as it would likely limit us in the future if we want to continue using the Readium framework.

So if I were to stay true to the scope of this thread, the blacklist extension, I would say that we should first try to find means to do what we want within the current approach rather than diverging. Just my opinion. (That being said, the pull request doesn't take away our ability to use the framework per the original intent.)

From the standpoint of our project, I think the biggest improvement could be made by allowing the client to reference the overlay divs. If the annotation id was placed in the element(s), that would give us a way to access the annotation div, style it, add listeners for what we needed, get the position for related controls, implement a button to hide highlights etc. We can access them now, but only using jquery class selectors for example $('.highlight') but that doesn't relate the div to the annotation. Maybe just a getHighlight(id) method that gives us access to the overlay elements would do - maybe return a highlight group object?

That wouldn't solve my issue 2. We've been able to get around issue 2 (iframe text interaction obscured by overlay div) by toggling the z-index of the div during the drag gesture that we use for text selection. But that could get messy if we want to support more gestures, like double click of text.

Regarding your statement that you've gotten rid of the span markers completely, I'd like to know more. Is that change in the dev branch or just planned at this point? Can you point me to the right place.

thanks

marcuswu · 2014-11-17T20:24:51Z

I understand the desire to move away from altering the spine DOM, and I can understand phasing that out as part of the Readium viewer project. However, blacklist use is pretty integrated into the epub cfi project.

Do you intend to deprecate and remove that functionality from the API?

If the intention is to leave it in the API but not utilize it in the viewer project, doesn't it make sense to provide the API feature in a way that makes it as useful as possible to anyone who might use the library?

danielweck · 2014-11-17T21:08:48Z

By the way, blacklisting is currently passed as function parameters in shared-js, instead of being centralised by some sort of "provider" interface. This could be made more elegant / maintainable.

e.g.

https://github.com/readium/readium-shared-js/blob/develop/js/views/cfi_navigation_logic.js#L563

https://github.com/readium/readium-shared-js/blob/develop/lib/annotations_module.js#L583

https://github.com/readium/readium-shared-js/blob/develop/js/views/media_overlay_player.js#L180

dannyvenier · 2014-11-18T19:42:07Z

I think it is very likely that will need to modify the DOM (and use a blacklist) for numerous purposes for our application. Enabling this capability need not turn Readium in to "Readium-Writeum" (...sorry) and diverge from it's original intent.

For example, the findText() method in Rangy wraps search results in span markers (albeit it would likely be only temporary). The point is, that there could be many cases where modifying the main DOM and injecting elements into it makes more sense for the particular application than finding an overlay solution.

danielweck · 2014-11-18T20:31:47Z

@dannyvenier agree about your observation regarding Rangy's approach to "highlighting". Allow me to mention my above comment (starting with "For the sake of completeness") which points out the existing EPUB3 Media Overlays CFI highlighting method (audio synchronisation with arbitrary character ranges).

dannyvenier · 2014-11-18T21:04:39Z

@danielweck

Ah, yes, you were on point. That particular point was lost on me at the time of reading as I hadn't really looked at Rangy. When I ran into the injection on the rangy equivalent to search-text the light came on. As it turns out, the highlighting of search results is likely similar to your example because it is a short term modification so another edge case. However the key point is that there are numerous features that one might try to implement where injecting into the DOM is possible, which in my mind, justifies the use and enhancement of the blacklist feature. Your suggestion of implementing it as a provider makes a lot of sense.

Dmitry mentioned in one post that the span markers for highlight have or are being completely removed. Has this happened yet and if so what branch is that in?

And is there a known timeframe for the pull request which is the subject of this thread?

danielweck · 2014-11-18T22:52:44Z

I think the start/end markers still get inserted:

https://github.com/readium/readium-shared-js/blob/master/lib/annotations_module.js#L565

https://github.com/readium/readium-shared-js/blob/master/lib/annotations_module.js#L976

marcuswu added 3 commits November 4, 2014 10:39

Different blacklist handling and bug fixes

39de8c6

* Handle node injection after the all text in a text node * Apply blacklist to a node, not its subtree * Fix a couple of other bugs

Blacklisted node text children were ignored

e1620ba

Fix blacklist on text node and infer text node fix

181cc07

danielweck added the CFI label May 27, 2015

jccr mentioned this pull request Jun 29, 2015

Hypothes.is / Annotation example integration readium/readium-js#109

Closed

rkwright added the priority high [critical] label Aug 6, 2015

rkwright removed CFI labels May 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blacklist without subtree #19

Blacklist without subtree #19

marcuswu commented Nov 4, 2014

danielweck commented Nov 4, 2014

danielweck commented Nov 4, 2014

marcuswu commented Nov 12, 2014

dannyvenier commented Nov 12, 2014

marcuswu commented Nov 12, 2014

dmitrym0 commented Nov 12, 2014

marcuswu commented Nov 12, 2014

dannyvenier commented Nov 12, 2014

dmitrym0 commented Nov 13, 2014

dannyvenier commented Nov 14, 2014

marcuswu commented Nov 17, 2014

danielweck commented Nov 17, 2014

dannyvenier commented Nov 18, 2014

danielweck commented Nov 18, 2014

dannyvenier commented Nov 18, 2014

danielweck commented Nov 18, 2014

Blacklist without subtree #19

Are you sure you want to change the base?

Blacklist without subtree #19

Conversation

marcuswu commented Nov 4, 2014

danielweck commented Nov 4, 2014

danielweck commented Nov 4, 2014

marcuswu commented Nov 12, 2014

dannyvenier commented Nov 12, 2014

marcuswu commented Nov 12, 2014

dmitrym0 commented Nov 12, 2014

marcuswu commented Nov 12, 2014

dannyvenier commented Nov 12, 2014

dmitrym0 commented Nov 13, 2014

dannyvenier commented Nov 14, 2014

marcuswu commented Nov 17, 2014

danielweck commented Nov 17, 2014

dannyvenier commented Nov 18, 2014

danielweck commented Nov 18, 2014

dannyvenier commented Nov 18, 2014

danielweck commented Nov 18, 2014