-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Targeting text that spans an element boundary #72
Comments
This was indeed our intent. That said, I think that, in general, exact matching should be preferred since it preserves the targeted context even if the page changes, making it less fragile [1]. However, in our brief usage it seems that there's often cases where an exact match should seem possible but isn't. For example, it's often the case that text has some inline iconography or images. This breaks our current exact matching algorithm and requires using a range. As an experiment, I changed it to use an (internally) modified innerText algorithm: perform the exact search on In my brief testing this actually worked really well for exactly matching across complex DOM and had the benefit of being very predictable and easy to generate links. In addition, innerText is specified and interoperable and handles lots of the edge-cases we worry about already so it seems like it'd be a good choice. I wonder if this idea has been explored in the past or if there's any drawbacks I'm missing. Would appreciate any input here. [1] There's some nuance here, a long exact match might be more fragile than an equivalent range. e.g. if the author fixes a single typo in the passage. In addition, ranges are also useful to keep link size down when the intent is to target a very long passage; 1000+ char URLs are rather unwieldy without support from tools. |
I was planning on implementing this in https://github.com/apache/incubator-annotator in order to demonstrate a proof of concept for exactly this use case. Is your implementation in the browser, and not in JS? I'm very motivated to get a JS version of this to help polyfill this feature. |
One that comes to mind is the world of editors built around content editable and document schemas. For example, ProseMirror will map a position to a node. Its positions don't track |
Yeah, my implementation was in the browser. The main push back I've gotten from Blink experts in this area is that innerText doesn't account for Shadow DOM and some issues around Unicode text matching/conversion which I'm still trying to understand better. |
I'm inclined to leave things as-is for now. I think there's room to improve the matching algorithm to use an "innerText-like" algorithm (e.g. converting table cells to " ", blocks to newlines, etc.) but I'm not sure how much value it'll add vs. the effort required. The latest refresh of the spec I've done in #90 specifies the block element boundaries explicitly so I think there's nothing left to do here. |
Will it allow linking to fragments of text with links or italics (i.e. inline nodes)? For example, can I link to:
or will it be impossible? In other words: if I write an article, then somebody saves a fragment link, then I "wikify" some words in the article or make them strong/italic, will the fragment link break? |
The algorithm uses block elements as boundaries. That means
will match in your example, but
will not. However, a range based match can cross block boundaries. The second case above could be matched using |
From the find a target text algorithm steps (section 2.4.3), it seems as though the
TreeWalker
is used to iterate over text nodes and the search is performed on theirinnerText
. This seems to mean that no direct search for text that spans an element boundary would ever match.The easiest way to address this would be to say that this is intentional and that any such match should use a range fragment, specifying the start and end text.
There are maybe other ways to address it, but I think that they immediately bring complications with the
innerText
algorithm.The text was updated successfully, but these errors were encountered: