Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case-sensitive access to HTML attributes (as they were written) would be useful for framework authors #1076

Open
trusktr opened this issue Oct 13, 2024 · 8 comments

Comments

@trusktr
Copy link
Contributor

trusktr commented Oct 13, 2024

We cannot change how the HTML parser works, because it is difficult and dangerous or so I hear.

However, I think that we can easily expand it so that it will be able to additionally write case-sensitive attribute names to new structures accessible in JavaScript in a backwards-comaptible way.

For example, we could expand Attr objects to have a new property .originalName (name can be debated) that will contain the original name as written when it was parsed.

Use Case

People making frameworks with html tagged templates want to know the case-sensitive names as they were written in the templates.

For example, Lit's html has syntax like .fooBar= for setting JS properties, which is case sensitive. Lit currently uses RegExp instead of DOMParser to be able to read the values in a custom way that is case sensitive.

However, if the DOM provided a way to read the original names as written, then it could elimlinate the need for complex custom RegExp usage.

As an example of a beautifully simple html template tag, see Pota's html:

https://github.com/potahtml/pota/blob/7db5873ad7e95582c06e8810b2d9e219f299714c/src/html.js

The whole thing is 425 lines of clean, commented, spacious, easily understandable code, because it uses DOMParser which already has parsing complexities abstracted away.

And! Pota's returns actual DOM!!

const div = html`<div>value: ${someSignal}</div>`()
console.log(div instanceof HTMLDivElement)

That's possible simply because DOMParser returns DOM! Its a thing of beauty.

The only problem with Pota's HTML is that it has the case insensitive issue because it uses HTML mode (instead of XML mode) for particular reasons, so unlike with Lit's html, Pota's html cannot distinguish .foobar from .fooBar.

If we had case sensitive access to original attribute names, Lit's html could be greatly simplified to be more like Pota's.

Solid's html is also a bit messy with its custom parsing with RegExp, despite also being considerably shorter than Lit's.

The main point is, with access to the original case sensitive names of attributes, we can simplify both Lit's and Solid's, to be more like Pota's, and also it could provide a basis for a native html template tag shipped in the browser with explainable behavior instead of bespoke parsing that is not easily doable in userland:

@trusktr trusktr changed the title case-sensitive attribute APIs would be useful. case-sensitive access to HTML attributes (as they were written) would be useful for framework authors Oct 13, 2024
@EisenbergEffect
Copy link
Contributor

I like the idea of adding a new property to the Attr type to preserve the pre-normalized casing of the original HTML text. That seems like it would be completely backwards compatible. I have no idea whether that's feasible from the parser implementor's perspective, but I think it's a reasonable request to explore.

@justinfagnani
Copy link
Contributor

Materializing Attr nodes is very slow. I would want something like getAttributeNames() that returned case-preserving names.

I worry about how tools will handle this. They can assume that attribute and tag names are case-insensitive and lowercase them all. It might be hard to say that they now need to preserve the case on everything. Should case-reserving only be added to <template parseparts> and hope that tools can be updated to treat just that specially?

@keithamus
Copy link
Collaborator

keithamus commented Oct 13, 2024

Just some field notes on implementation complexity (not an implementer but merely a student of implementations):

All browsers do "string interning" of attribute names. Some of this work would need to be reversed for this (this may result in more heap allocations which would also degrade performance) or delicately authored around (this would make for very complex changes and limit exactly how plausible this would be). So it's very complicated change to make for engines. It touches lots of parts of the codebase and in many places that are optimized intensively to improve performance. It may regress performance, even in the cases where the case-sensitive attribute name is not retrieved (as it still needs to be stored).

So I think there's a very high bar of value required for this change to be deemed high enough priority for browsers to take on the engineering challenge. I would imagine there are mechanisms which could alleviate the general problem while being more palatable (e.g. DOM parts).

@trusktr
Copy link
Contributor Author

trusktr commented Oct 13, 2024

Materializing Attr nodes is very slow

How much slower are we talking about? Pota's html uses DOMParser, turning all attributes into Attr, and its pretty high up there on speed:

https://krausest.github.io/js-framework-benchmark/current.html

I'm guessing its only a template startup cost, which I think may be fine.

Arguably though we need more benchmarks with more scenarios than js-framework-benchmark, especially ones with components being created and destroyed a lot.

@trusktr
Copy link
Contributor Author

trusktr commented Oct 13, 2024

All browsers do "string interning" of attribute names

So any time we access the attribute names in JS it clones them? Anytime before we access them we also make strings in JS to match against the attribute names?

What if it interns the pre-normalized name instead, if not both?

@trusktr
Copy link
Contributor Author

trusktr commented Oct 13, 2024

I worry about how tools will handle this. They can assume that attribute and tag names are case-insensitive and lowercase them all. It might be hard to say that they now need to preserve the case on everything. Should case-reserving only be added to <template parseparts> and hope that tools can be updated to treat just that specially?

This is still backwards compatible: all existing apps with any tools that may have lowercased everything before sending it to the browser will still work as they did before.

People wanting the new feature would have to fix their tool, or pick a different tool. But to me this seems fine. It gives people an option that they don't have to use.

So, even if some people would have to consider switching tools, at least other people could start using it. I personally haven't encountered a tool in any of my projects that lowercased my HTML (I usually go buildless, but also I don't use SSR typically either, and html tagged templates do what they do on the clientside with no issue in my case).

@keithamus
Copy link
Collaborator

So any time we access in JS it clones them?

AIUI no. My undersranding: when you access in JS it points to the same memory (strings themselves are immutable in JS, but if you call an operation, e.g. toLowerCase() you'll get a different immutable value back, which can be assigned to your identifier).

Anytime before we access them we also make a string in JS to match against?

During parse time the string is lowercased and compared to an interned string. The old contents are effectively thrown away, and this is the issue - we'd need to start keeping around both.

What if it interns the pre-normalized name instead, if not both?

That's sort of not possibe. Engines do compile time interning. But not every attribute is an interned attribute name (e.g. data- attributes aren't, and you can always make up an attribute name the browser doesn't know about), so string interning isn't mandatory, it just makes things way faster.


Something I'm not sure about though. Given this:

<div foo="bar" fOo="baz" fOO="bing" FoO="qux" foO="quux">

What would be the answers to these questions:

  • What is the content attribute set to? HTML spec says right now is should be bar.
  • Should the browser throw away all of the subequent attributes?
  • What are you expecting to get back? If you call getAttributeNode('foo').originalName do you expect 'foo' or 'foO', or something different?

@EisenbergEffect
Copy link
Contributor

I can't comment on the broad set of scenarios, but for my own use cases of building a templating engine, I think having this restricted to parseparts templates would be fine. I don't particularly have strong opinions about the API shape. If Attr is too expensive, then I'd favor whatever we can come up with that performs well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants