-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document parsing of (X)HTML entities, or drop it even? #4
Comments
Thank you for reporting this issue and appreciate your patience. We've notified the core team for an update on this issue. We're looking for a response within the next 30 days or the issue may be closed. |
React has already chose to deviate from HTML. facebook/react#2781 and
from @spicyj answer https://www.quora.com/Why-do-I-have-to-use-className-instead-of-class-in-ReactJs-components-done-in-JSX/answer/Ben-Alpert Therefore, I am in favour of dropping HTML entity support. |
@gajus From HTML - yes, from XML - not so much (apart from JS injections). |
Well, I am biased. I want JSX to allow template strings in JSXAttributeValue. The fate of that issue depends on whether HTML entity support is dropped or not. This is another consideration to have when deciding on this. |
Do those two braces around really mean that much to you to change two behaviors? 😄 |
One is HTML entities. Whats the second? |
Template strings without braces on their own. |
I think that since JSX is present in JS and that it is in essence a syntactic sugar for React.createElement(`div`, {className: `foo-${foo}`}, `bar-${bar}`); should not be different from <div className=`foo-${foo}`>`bar-${bar}`</div> |
Then we return to questions like numeric literals, object and array literals and so. |
@RReverser Explain? If I understand correctly, then yes, objects, strings, <div foo=null />
<div foo=123 />
<div foo=() => {} />
<div foo=({}) /> Does this clash with anything in the spec? |
It doesn't clash, but increases complexity for purely aesthetic reason. |
That is true. But consistency/conventions lower bug count (sorry, no reference for this stats). Assuming that is true, then if the rest of the code base is using convention X (template string in this case), it would make sense if JSX supported that too. |
That arguments has two sides - on one hand, you're increasing consistency for those who work with JS for developing logic, an on another you at the same time decrease consistency and familiarity for those who develop views (HTML/XML coders). |
@sebmarkbage I'd say #28 is a candidate for otherwise keeping JSX as it is and being able to drop XHTML entities.
IMHO the problem is that it is inconsistent, it would be fine if |
Dunno, maybe, but didn't meet such people yet. Right now it's pretty balanced in sense that most realize that The biggest benefit of entities is that they're properly named and easy to remember. Most people know perfectly how to write |
PS. If you want |
Was just typing that. Why bother with HTML entities at all. |
You mean use specific keyboard layout that allows them or table character application? Not all platforms & localizations have that ability out of the box. |
Copy paste from https://en.wikipedia.org/wiki/List_of_Unicode_characters. |
Thats genuinely what I do when my keyboard does not have a character that I need. Since it is very rare that I would need a character thats not on my keyboard, it does not bother me. I cannot imagine anyone being bothered by that either. |
Well, I do that as well, but it's not pleasant at all, and it's not as rare as it seems - especially for examples above as non-breaking spaces, medium dashes and copyright characters. They are in fact much more often than |
While not all platforms support character maps, I imagine that every IDE/text editor has a plugin for that (vim, Sublime, WebStorm, to name a few). |
Not to mention that "regular text" is rarely typed in React code. It is something you load from a database of some sort. |
http://fsymbols.com/computer/copyright/ I'm pretty sure entities aren't meant to be human-friendly first and foremost, but simply a mechanism for escaping that is charset and implementation independent. Regardless, I don't see how this is a problem JSX should try to solve (and intentionally deviate from JS), JS makes no effort. |
So in any case - remove built-in human-friendly way for escaping, and instead force dev to google/use charmap/plugin/whatever. Degradation of DX is not something nice.
Often it does - text is exactly the thing that is rather rarely generated dynamically compared to static parts on the page (user names, blog contents, numbers are but those are rather minority and have not much to do with our issue and special characters). And if we take your assumption, then this issue doesn't make sense to discuss at all.
In that case, they would be left as
I see, this issue becomes yet another discussion of whether JSX should be sugar as much as possible compatible with XML/HTML syntax or we should reduce it's coverage slowly moving towards JS. I don't buy the second way because it's no better than just using some kind of Hyperscript - if you want JS, you can write JS, but JSX is beautiful exactly because you can escape some of JS painful points when dealing with structures and contents such as unobvious nestings and foreign-locale escapes. |
No, because
If you ask me, JSX should not expand to do more than is absolutely necessary, that is to introduce the concept of elements in a meaningful way. If we want to solve anything else then it should be considered independently and where possible proposed to ECMA instead so that everyone benefits and not just a partial subset of JSX content. "Foreign-locale escapes" sounds far more useful at the level of JS. |
Or namespaces or CDATA sections or comments…IMO there are a bunch of ways that it deviates. I'm sympathetic to the DX argument, but IMO the best thing for DX is to keep the transformation as simple as possible. Also, the more similar JSX and XML are, the more confusing any deviation becomes.
👍 |
If the purpose of JSX is to be agnostic to a certain target (that's not always HTML) then does it really make sense to allow HTML entities? |
If we get buy in, will we have any problems making the switch? I.e. will we risk a long lived fork? The codemod should be safe. |
Do we have any stats (or anecdotal evidence) on how widely used HTML entities in JSX are? |
Or backslashes... |
Oh right. I've actually broken backslashes in JSX attributes before in Babel and it took over 7 days for someone to notice and file an issue: babel/babel#2114. |
I believe that entities (or other specific things) should be handled by the renderer which transforms JSX-output to HTML DOM/HTML string, but not by the transformer which transforms JSX to JSX-output. |
@NekR It would then apply to all strings equally so even user input would be subject to HTML entity decoding (aside from it being a runtime cost too), you definitely do not want that. |
@syranide what is user input in JSX? I did not say everything in runtime should be parsed with entities. class EntitiesString {
constructor(str) {
this.str = myLibraryDoesHTMLEntytiesParsingHere(str);
}
toString() {
return str;
}
}
<div>{ new EntitiesString(' ') }</div> |
@NekR I interpreted that differently. IMHO what you are proposing is runtime decoding (which is for everyone to decide on their own) and outside this discussion about entities/escape codes in JSX source code. EDIT: That is to say, JSX needs to support escaping to some extent (like |
Yes, I meanе that renderers are responsible for parsing entities. One could support
Of course I do not propose such decoding method here for JSX, it's implementation detail of JSX consumers. What I am saying is that entities parsing on a transpilation stage is not needed (because of runtime possibilities) and hence it's in scope of this discussion, right?
Hmm.. |
IMHO no, entity parsing during transpilation and runtime decoding of entities are "complementary". Runtime decoding of static source code strings in this context is inefficient and cumbersome.
Produces |
Sorry, but topic is "Document parsing of (X)HTML entities, or drop it even?" and I am saying: Drop it. How it's not related? Runtime parsing was suggested as a solution. Some one who do not want runtime solution could write plugin which will pre-parse entities to JS escapes or something like that. But you are not even listening to me. What I am saying is that it makes sense to have
This is only problem of React since it's doing re-render on every move. I use JSX in a different way and it's perfectly fine for me.
Why we need to do work arounds or escape JSX? Just have JS string everywhere. I do not see any difference here except that transpiration entities parsing is benefit for React. P.S. Interesting that you made this repository public and asked for feedback from non-React implementations and when people came here with their opinions, you say: "This is not related". Just make it private repository and then no problem with "not related". |
@NekR I'm only one collaborator of many, these are my opinions. Feel free to refute them, but there are many things to consider. If I didn't care about your opinion I wouldn't have responded.
Decoding at compile-time (source code and static strings) and run-time (dynamic strings) can both co-exist and make sense. In the context of language design, run-time decoding being possible is not an argument against a syntax feature, nor vice versa. They are solutions to different problems. Yes, we both agree that HTML entities should be dropped, that's not what I objected to. I undoubtedly think that is the way forward, but the holes left behind by dropping HTML entities still needs to be considered, runtime decoding is not it. |
I saw many such arguments and decisions in TC-39, but okay, you do not accepts this as argument then nevermind.
Why? Where is a big performance problem with it except of React contact re-render? |
Personally I don't think the DX argument is valid. And that is not through an expectation of everyone using character maps, etc... JSX is JavaScript and it doesn't really make sense that the solution when writing JS+JSX to "I can't type © with my keyboard" is "You can use <Foo
label="I can © here"
legal={__('This site \u00A9 2016 Acme Media Inc.')} /> Same code. But you can use If this is a problem, it is a problem universal to JS and not one that should have a JSX-only fix. Rather I think the solution is to embrace the fact we're writing JS and fix this with JS. Specifically, given #25 I think the solution to "I can't type © with my keyboard and don't want to use a character map, C&P, or use some other tooling" is this. var ent = require('character-entities');
<Foo
label=`I can ${ent.copy} here`
legal={_(`This site ${ent.copy} 2016 Acme Media Inc.`)} /> |
## Summary Let's be faithful to the de-facto and document the HTML entity behaviors to the spec. Note that this is not about whether we should "drop this semantics or not", but about documenting the current behaviors that everyone has been living with for years. ### The Proposed Normative Change I'm not aware of any practices specifying such transpiler/transform semantics in ECMA-262 so this is a really interesting attempt 🙂 So I ended up extending `Static Semantics: SV` which is the smartest way I can find to hack the semantics into the ECMA-262 spec. I think this should work and should be accurate enough. I'm curious on how implementors think about it though. <del>I also intentionally left the set of supported HTML entities implementation-defined to allow either HTML4 or HTML5 set. This may be seen as a breaking change in some regard and **this is open to discuss here**. </del> We've reached consensus that only HTML4 entities are allowed. This commit also close #133 by using `::` for characters which are supposed to be lexical grammars. Close #126 Close #4 ## Test Plan open `index.html` and proof-read the spec ;)
We should probably document how (X)HTML entities are parsed.
However, I can imagine dropping HTML entities instead and adopt the escaping used by JS-strings, i.e.
bla \< \{ \u1234 bla
. To me it would make sense in many ways:<a href="&\" />
vs<a href={'&\\'} />
which is kind of awkward.The downside of dropping HTML entities is obviously that you wouldn't be able to copy-paste HTML and it could be a mental disconnect for a lot of users. But I think it makes a lot of sense from a technical perspective.
I think it makes even more sense if you look beyond HTML. Why would you be using HTML entities for non-HTML frontends? Like iOS, QT, etc.
The text was updated successfully, but these errors were encountered: