Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Element.currentLang and Element.currentDir #7039

Open
claviska opened this issue Sep 8, 2021 · 51 comments
Open

Proposal for Element.currentLang and Element.currentDir #7039

claviska opened this issue Sep 8, 2021 · 51 comments
Labels
addition/proposal New features or enhancements i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs implementer interest Moving the issue forward requires implementers to express interest

Comments

@claviska
Copy link

claviska commented Sep 8, 2021

I'd like to propose the introduction of two read-only properties on the Element object:

Element.currentLang
Element.currentDir

Element.currentLang and Element.currentDir are read-only properties that reflect the element's current language/direction as determined by their or their closest ancestor's lang and dir attributes, respectively.

The primary use case is to improve i18n in custom elements, but the benefit will also be seen by frameworks that currently use a separate, non-standard context to determine these values. Exposing the current inherited language and direction will provide better localization capabilities by removing performance hurdles and eliminating the need for additional logic and special contexts.

This information isn't currently available without expensive DOM traversal. Furthermore, selectors such as Element.closest('[lang]') will stop if they reach a shadow root, requiring recursive logic to break out of them:

// Recursive version of Element.closest() that breaks through shadow roots
function closest(selector, root = this) {
  function getNext(el, next = el && el.closest(selector)) {
    if (el === window || el === document || !el) {
      return null; 
    }
    
    return next ? next : getNext(el.getRootNode().host);
  };
      
  return getNext(root);
}

const lang = closest('[lang]', myEl).lang;

As a custom element author, it's not uncommon for users to have dozens of components on a page. It's also not impossible for a page to have multiple languages and directions. For components that require localization, the only way for them to inherit lang and dir is via DOM traversal or other non-standard logic. This, of course, isn't very efficient.

Being able to reference Element.currentLang and Element.currentDir will solve this in an elegant way using data the browser is likely already aware of.

Additional thoughts:

  • It seems pragmatic to expect lang and dir to pass through shadow roots. If desired, the custom element author can override it by applying lang or dir to the host element or to any element within the shadow root.
  • This proposal doesn't address a way to listen for language/direction changes. This would be incredibly useful, but probably out of scope for discussion within this group.
  • Interestingly, this is something that we can do with CSS via :lang and :dir (limited support). Unfortunately, there's no clean way to discover this value with JavaScript.
@annevk annevk transferred this issue from whatwg/dom Sep 8, 2021
@annevk annevk added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Sep 8, 2021
@annevk
Copy link
Member

annevk commented Sep 8, 2021

Moving this to HTML as DOM doesn't define language or direction. We've had discussions about this kind of feature in the past. If someone could dig those up that would be helpful.

@r12a r12a added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Sep 8, 2021
@WebReflection
Copy link
Contributor

element.matches(':lang(en)') works pretty well though, but I wonder if having those passing through SD would break expectations, specially in case people didn't test different dir of their SD elements.

On the other hand, this problem doesn't exist without SD or built-in extends, but enabling something to go through in SD might be the beginning of tons of other requests.

@claviska
Copy link
Author

claviska commented Sep 8, 2021

element.matches(':lang(en)') works pretty well though

This works if you know the language(s) being used. If the language is arbitrary, there's really no mechanism to determine it without brute forcing it (silly) or DOM traversal (expensive).

On the other hand, this problem doesn't exist without SD or built-in extends

This problem is not exclusive to custom elements with shadow roots. You'd still need DOM traversal to reliably use the current language or direction of any element. A non-custom element use case may be a library or framework that handles localization and would prefer to use the platform-provided lang instead of a specialized context.

enabling something to go through in SD might be the beginning of tons of other requests.

I'd argue that localization is fairly unique and shouldn't be reset. At least, I can't think of a single use case where localization shouldn't persist until explicitly changed in the DOM tree.

@WebReflection
Copy link
Contributor

WebReflection commented Sep 8, 2021

lang and dir are not coupled though, but I agree indeed there's no way to know these directives without some JS seppuku.

however, dir is usually language dependent, and if document.documentElement.lang is an empty string, we are in troubles, but otherwise it's relatively trivial to know if a well presented HTML page has a language preference.

mapping lang to a dir is not too trivial task, and if the browser knows how it should behave accordingly to either lang or user settings, it could be awesome to understand that, yet I believe any hook on the document would do, as anything else would likely violate the user lang/dir preferences (just trying to keep this proposal simple enough, and yes, it's useful for my daily use-cases too).

@claviska
Copy link
Author

claviska commented Sep 8, 2021

It's worth noting that there can be multiple languages in an HTML document, so referencing document.documentElement.lang isn't a reliable solution. Some examples:

  1. Displaying excerpts in other languages
  2. Displaying quotations in other languages
  3. Things like: <p>The word for "hello" in Spanish is <span lang="es">Hola</span></p>
  4. <time datetime="2021-09-06 18:20" lang="es">[time formatted in es locale]</time>

It would be useful for libraries, utilities, and child components to be self-aware of the intended language and direction so they can render with the correct locales.

@WebReflection
Copy link
Contributor

fair enough, then I guess currentX proposal is needed desirable (got confused with the i18n language related API, your use cases are indeed relatively common).

@WickyNilliams
Copy link

for direction, it seems you can at least do this:

getComputedStyle(element).direction

seems to work even in scenarios where elements are nested with different values, including being implicitly inherited from some ancestor. see here https://codepen.io/WickyNilliams/pen/QWqgXOQ

though I still think a dedicated property is useful.

I would say there should even be some way to observe changes to these values. If an element gets re-parented, or some (unknown) ancestor has its lang/dir values change, aside from polling (ugh) there would be no way to know and react to such a change

@claviska
Copy link
Author

getComputedStyle(element).direction

Good tip. I believe this will trigger a reflow, though, so a cached property would be preferred.

I would say there should even be some way to observe changes to these values.

I agree. Perhaps an event similar to languagechange would be helpful, but I don't want to bloat the initial proposal. It's also worth noting that lang and dir both reflect, so a mutation observer could be used to detect such changes in the interim.

@WickyNilliams
Copy link

agreed, it is far from ideal, but a decent workaround for now.

can an MO cover all cases? what's the perf impact of observing the entire subtree from the document root? what happens if there are nested, intermediary shadow roots and the dir/lang is subject to change inside any of them? you'd have to climb up the tree and attach an observer at every root, as well as document? feels like there might be a ton of edge cases!

@claviska
Copy link
Author

can an MO cover all cases?

It won't pick up attribute changes in shadow roots, so no. Each component that's interested would need to attach a separate observer to its respective shadow root, which isn't ideal.

A composed event that bubbles up would be optimal, but that should probably be a separate proposal. But since we're here, perhaps dirchange and langchange would be reasonable candidates for event names.

@jakearchibald
Copy link
Contributor

I think observability is a must - that's the trickiest part of this, and it's something browsers already implement in order to support :lang() and :dir() selectors.

I presented a couple of options in #9918:

Option 1

Extend MutationObserver (or create a new LanguageObserver) to allow for observing currentLang. It feels like this should be an observer rather than an event since it's so closely linked to DOM changes.

Option 2

Provide a way to observe changes in CSS selector matching.

const result = element.matchSelector(`:lang(${element.currentLang})`);

result.addEventListener('change', () => {
  const newLang = element.currentLang;
  // …
});

This is based on window.matchMedia, but matches a selector.

@WickyNilliams
Copy link

interesting ideas.

extending mutation observer seems off to me, since MO is concerned with sub-trees, whereas lang/dir are the opposite (comes from above). is there precedent for an observer which works that way?

matchSelector feels like a broadly useful API, even outside of this use case - curious if that harms or helps the chances of getting this through?

@keithamus
Copy link
Contributor

We could also split the difference between LangObserver/matchSelector and look at something like SelectorObserver.

@jakearchibald
Copy link
Contributor

@WickyNilliams

extending mutation observer seems off to me, since MO is concerned with sub-trees, whereas lang/dir are the opposite (comes from above).

It's only concerned with subtrees if you opt into that, otherwise it's just concerned with the element being observed. But you're right that none of the values it observes are computed.

is there precedent for an observer which works that way?

Intersection and resize observers observe computed values that are impacted by things all over the tree.

matchSelector feels like a broadly useful API, even outside of this use case - curious if that harms or helps the chances of getting this through?

I agree it would be generally useful, however it might not be the best fit for this use-case. The example I gave only observes one change - you'd need to un-observe and observe the new value each time. Not too tricky though.

The potentially trickier issue is timing. Something like matchSelector wouldn't signal its changes until style calculation, which feels wrong for something like lang, which relates to content semantics rather than style. But if that's the timing browsers use for updating <input> etc, fine.

@jakearchibald
Copy link
Contributor

Hmm, browsers don't seem to respect the element's language when it comes to <input>.

@WickyNilliams
Copy link

It's only concerned with subtrees if you opt into that, otherwise it's just concerned with the element being observed. But you're right that none of the values it observes are computed.

sorry yes, i used an overloaded term with sub-trees. i meant whether entries are derivations of the element/its descendants vs an element/its ancestors. i guess if there is an accompanying computed property on the element, then conceptually such an observer doesn't differ. makes sense re: IO/RO.

you'd need to un-observe and observe the new value each time.

hmm yes, that would be quite an awkward API.

Something like matchSelector wouldn't signal its changes until style calculation, which feels wrong for something like lang, which relates to content semantics rather than style.

might the timing issues cause any temporary inconsistent states? e.g. i'm thinking of a case where i change to hebrew as a lang and rtl as dir - could you end up with hebrew shown in a LTR layout, or the previous content in an RTL layout? either visually, or from the perspective of running code. i'm not familiar enough with browser internals to understand the implications

@jakearchibald
Copy link
Contributor

might the timing issues cause any temporary inconsistent states? e.g. i'm thinking of a case where i change to hebrew as a lang and rtl as dir - could you end up with hebrew shown in a LTR layout, or the previous content in an RTL layout? either visually, or from the perspective of running code.

The browser would have to calculate styles in order to render, and that would trigger the observer, so I don't think that's a problem.

If a tab is "not visible" (therefore not generating frames, therefore not calculating style), running code (eg setInterval) could observe that currentLang has changed but the content hasn't.

You'd still get a bit of that with an observer, since it'd be offset by a microtask, but tying it to rendering seems confusing.

Observers don't need to be tied to microtasks fwiw. Mutations observers are tied to microtasks, whereas resize/intersection observers are tied to rendering.

It feels like this should be timed similar to mutation observers, since it's DOM mutations that cause lang to change.

@rajsite
Copy link

rajsite commented Nov 8, 2023

Hmm, browsers don't seem to respect the element's language when it comes to <input>.

Kinda related, something we went around circles in was trying to understand the difference between:

  • lang attribute
  • navigator.language / navigator.languages
  • new Intl.NumberFormat().resolvedOptions().locale
  • Content-Language header of the document (not sure how it manifests in JS)

In the end what we want to know is when should we as authors use:

  • what the server says is the intended locale for the document (maybe not actually useful?)
  • what the user says in browser settings what their preferred locale is (let them input in preferred locale maybe...)
  • what the current operating system locale is (maybe not useful to know directly / should rely on user browser setting)
  • what the current element context calculates the locale is, i.e. current value of lang cascade (seems the most useful)

@claviska
Copy link
Author

claviska commented Nov 8, 2023

As the OP, I want to point out that I think @jakearchibald's proposal for matchSelector() is superior to my initial proposal in that it solves both getting the current language/dir and observing changes.

Consider this my vote for that as an alternative to the aforementioned properties. Additionally, the use cases for el.matchSelector() exceed more than just custom element localization.

To recap, from Jake's post above:

const result = element.matchSelector(`:lang(${element.computedLang})`);

result.addEventListener('change', () => {
  const newLang = element.computedLang;
  // …
});

@WickyNilliams
Copy link

WickyNilliams commented Nov 8, 2023

A complete solution using a hypothetical matchesSelector

let result = element.matchSelector(`:lang(${element.currentLang})`);

function handleChange() {
  const newLang = element.currentLang;
  result = element.matchSelector(`:lang(${newLang})`); 
  result.addEventListener("change", handleChange, { once: true }) 
} 

result.addEventListener('change', handleChange, { once: true });

It's quite awkward having to remember to cleanup and attach a new listener on every change imo. Of course this could be cleaned up a little, but it's just to demo it's not as easy as the snippet above

@rajsite
Copy link

rajsite commented Nov 29, 2023

Another difficult to track ancestor influenced state that would be useful to observe for changes would be isContentEditable based on contenteditable configuration.

Recently ran into an issue where we would like to have that propagate into the shadow root of a custom element which is currently blocked from propagating on its own. So we are trying to investigate ways to observe the state and reflect it in the shadowroot manually.

@jakearchibald
Copy link
Contributor

@rajsite that feels different to this issue. Can you file a new issue for your request?

@WickyNilliams
Copy link

WickyNilliams commented Mar 26, 2024

Now that the :dir pseudo-class has pretty decent support, it's at least easy to get the resolved dir of the current element via matches, which i imagine is cheaper than my previous approach of using getComputedStyle

const isLTR = someElement.matches(":dir(ltr)")

Still, being able to observe this would be nice.

@LeaVerou
Copy link

LeaVerou commented Sep 26, 2024

We discussed this today at TPAC in a breakout with @dbaron @jyasskin @fantasai (who also asked @annevk @r12a).

Originally, the sentiment for computedLang/computedDir was "seems reasonable, should be trivial to implement since the browser already tracks this, all that’s needed is patches for the spec and UAs".

However, after thinking about it some more, some folks were worried that once computedLang becomes a thing, authors are going to try and do naive language parsing like el.computedLang === "en" || el.computedLang.startsWith("en-"). Language parsing is full of complicated edge cases and authors should not be rolling their own, so people felt this could be a footgun.

Ideas discussed were:

  1. Decoupling: el.computedLang returns a string, and can ship earlier, but there is a separate class or utility method whose constructor accepts a language string and parses it into its components (which are defined by Unicode).
    • Pro: el.computedLang can ship earlier and is not blocked on the more complicated language parsing API
    • Pro: For performance sensitive use cases, the object creation can be deferred until actually needed, rather than having every lookup result in object creation
    • Con: The footgun of having a string that authors use string manipulation on is still there
    • Pro: The new class can be used to parse languages more generally, without having to set them on a dummy element first
    • Con: Unclear how to monitor changes, we’d need another API for that
    • Pro: Consistent with element.lang / element.dir
  2. el.computedLang returns an object with the language components parsed (as well as the whole string). Has a toString() method that returns the whole string, so that it can still be used as a string. Or it could have a matches() method to facilitate comparison.
    • Pro: No footgun, doing the right thing is as easy as doing the wrong thing
    • Con: The string version cannot ship independently and is blocked on the more complicated API (unless we just ship an object with a single property and add more properties later)
    • Con: More complicated to spec, will take longer
    • Pro: Extensibility. We can later add methods/properties that facilitate observing changes or any other utility. Or the object itself could even be an EventTarget that fires change events.
    • Con: Inconsistent with element.lang / element.dir

During the discussion I was of the opinion we should do 1, but now I’m leaning towards 2, potentially with a very bare-bones object at first. We could later make such objects constructible to address use cases not connected to the DOM (e.g. navigator.language).

@jakearchibald
Copy link
Contributor

URL exposing APIs follow option 1

@keithamus
Copy link
Contributor

In addition the Intl. objects require a string so I would prefer option 1.

I sympathise with the point of users rolling their own parsing but I'd suggest users shouldn't be rolling their own i18n, let alone parsing, and Intl APIs are provided for exactly this. It's just not trivial to get the computed lang to do so.

@annevk
Copy link
Member

annevk commented Sep 26, 2024

To what extent is language even parsed internally versus just a string being forwarded? Very much suspect it's the latter. E.g., https://software.hixie.ch/utilities/js/live-dom-viewer/?%3Cdiv%20lang%3D1test%3Edfsdfs%0A%3Cstyle%3E%0A%3Alang(%5C31test)%20%7B%20background%3Alime%20%7D%0A%3C%2Fstyle%3E (although for some reason this doesn't work properly in Firefox). (Whereas language tags proper are supposed to start with two, three, or five to eight letters as I understand it.)

@LeaVerou
Copy link

@keithamus ooh that’s a good point. I’d argue it should be an Intl object that even does the string parsing if that is needed.

@annevk I think @fantasai was referring to this sort of thing: https://dabblet.com/gist/08224b9a69810f6fbffe00dbd1181dca

:lang(en) { border: 2px solid blue }
<div lang="en-us">I’m blue</div>

@annevk
Copy link
Member

annevk commented Sep 26, 2024

Ah, that makes sense, but I'm relatively sure that parsing happens on top of the computed language. And maybe that explains why browsers have different answers for my test case, although https://www.rfc-editor.org/rfc/rfc4647#section-3.3.2 seems pretty clear that it ought to work.

@fantasai
Copy link
Contributor

fantasai commented Sep 26, 2024

Wrt #7039 (comment) my concern was about authors who want to do matching would be seduced by the simplicity of doing string-based language matching themselves, so we should make it really obvious and trivial to the right thing. If we only provide a string accessor, they'll be tempted to use string methods for language matching. Even if we provide a utility class or utility method somewhere else that they could optionally use, they won't know that they should be using it (or using :lang()) because they won't know that language matching is complex--and therefore and won't know that they need to go dig up such a utility method from some other jar of APIs.

So I don't have any strong opinion on how, it could just be providing a very simple .computeLangMatches() method that shows up in the API docs right next to .computedLang, but I think we need to shove the right answer for correct language matching in the face of anyone who finds .computedLang.

@LeaVerou
Copy link

Another factor we’ve missed in the earlier discussion is tracking changes, which is a hard requirement for this to be useful. So it looks like there are two designs:

  1. element.computedLang returns a string. A separate method can be added later to compare this to other strings (e.g. element.computedLangMatches(), and a languagechange event can fire on the element if the computed language changes.
  2. element.computedLang returns an object which is an EventTarget. A matches() method on that object can compare languages, and a change event can be fired on it when the computed language changes.

I don’t have a strong preference between the two. Both can be shipped incrementally (e.g. the object could just have one property at first). 1 might be more performant, and given how heavy DOM nodes already are, that is important. It also seems a little more consistent with how the rest of the DOM behaves, and a bit easier to spec. 2 could end up a little cleaner in the end and the object can be repurposed to work outside the DOM too.

@annevk what do you think?

@WickyNilliams
Copy link

Will any decision here relating to computedLang equally apply to computedDir? Just checking because it's not had as much discussion

@annevk
Copy link
Member

annevk commented Sep 27, 2024

Given that Element is already an EventTarget and Element is impacted by the language change, it seems most natural to dispatch the event there. The alternative would be something new for mutation observers. Which we'll need some integration with of sorts either way, at least internally, as it needs to match mutation observers timing-wise.

@LeaVerou
Copy link

LeaVerou commented Sep 27, 2024

Talking with @annevk this morning, it looks like we’re both in agreement that the shallower design (Option 1) is better for a variety of reasons (consistency, performance, simplicity). So hopefully we could resolve on that later today.

Will any decision here relating to computedLang equally apply to computedDir? Just checking because it's not had as much discussion

I imagine so, though that’s less of a pain point, as it’s easier to figure it out today.

@aphillips
Copy link
Contributor

I18N was just now wondering if el.computedDir returns auto or the actually computed direction (ltr or rtl) for the element? e.g. what does <span dir=auto>مصر</sapn> have as it's computedDir? auto or rtl?

@LeaVerou
Copy link

@aphillips I would imagine it always returns either ltr or rtl, never auto.

@aphillips
Copy link
Contributor

@LeaVerou That's what we thought, but wanted to capture the question. The language discussion I have added to the TPAC I18N+WHATWG discussion scheduled for later today (2024-09-24)

@LeaVerou
Copy link

Yup, thanks for adding it, I plan to be there! I'd need to leave the CSS WG meeting to join, based on what @past said, I'd estimate we won't get to it before 5pm so I’m planning to drop by around that time.

@keithamus
Copy link
Contributor

Just to confirm we're looking at currentLang() and currentDir() now?

@LeaVerou
Copy link

Probably more like element.getComputedLang() / element.getComputedDir(), though it’s unclear to me if the resolution is to make it a method or whether that’s still up for debate.

I will point out that the TAG principle that @domenic referenced simply says:

Getters should not perform any complex operations.

Even if this is not cached for some reason, I'd argue that return this.lang || this.parentNode.computedLang is pretty non-complex 🙃
But also wouldn't want to hold the feature back to bikeshed this further, if @domenic feels so strongly that it should be a method, whatevs.

@annevk
Copy link
Member

annevk commented Sep 28, 2024

I really like the names @keithamus suggests. Not really a fan of following the really elaborate naming of getComputedStyle().

As for getters vs methods, the question is really whether you consider tree traversal a complex operation. It can definitely be costly and is somewhat non-trivial as xml:lang also has to be taken into account and you need to look at the namespace of the element as to whether the lang attribute is applicable (which means that this.lang is probably never quite correct as it only reflects the lang attribute). It definitely feels much more like the offsetTop example than returning an existing internal field to me.

@jakearchibald
Copy link
Contributor

It seems odd to have a change event for a value that requires a function call to get. Or would the value be provided on the event object?

@keithamus
Copy link
Contributor

The event wasn’t really discussed but it seems reasonable to me that it would include the value (and possibly the old value).

I will point out that the TAG principle that @domenic referenced simply says:

Getters should not perform any complex operations.

Aside; perhaps it would be useful for the TAG to provide a more objective rubric for this. While implementations shouldn’t guide design IMO, having this closer to the reality of what is already available in an objects data vs what needs to be computed might be a more objective measure than the term “complex”.

@LeaVerou
Copy link

@annevk getComputedStyle() and offsetTop have side effects. currentLang/computedLang does not.

@jakearchibald
Copy link
Contributor

jakearchibald commented Sep 30, 2024

The TAG also encourages against putting data in events that could be a simple getter on the target object. But, if doing it that way is a performance issue / breaks other TAG rules, fair enough.

Are we sure it isn't something the UA just caches internally already?

@keithamus
Copy link
Contributor

It's not cached internally (unless you count the content attribute map a cache). The implementation does a flat-tree traversal + content attribute retrieval which are both (for certain values of) "fast" lookups that happen a lot. If I were to implement in Chrome (which I'm happy to do when we resolve the design) I think I would just call out to ComputeInheritedLanguage, hopefully y'all can follow the implementation and notice that it's not doing a cache, and it is walking the flat tree (note ParentOrShadowHostNode). AIUI it would be side effect free though.

I wonder; would there ever be scope to include an options bag to the method call which would preclude using a getter? Does that steer the conversation one way or another?


As for the events, perhaps it would be useful to put energy into whatwg/dom#1225 which is a more generic solution for observing style/selector invalidation that could easily map to observing such changes and gives us the facility to use the |= attribute value selector which grants more flexibility in observing changes to language families. I think I mentioned this back in Nov (#7039 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs implementer interest Moving the issue forward requires implementers to express interest
Development

No branches or pull requests