-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(cpp) Fix highlighting of unterminated raw strings #2261
Conversation
I'll play around with a bit. Nice to see this was so easy to accomplish. I'm going to explore going broader though: { // heredocs
- begin: /<<[-~]?'?(\w+)(?:.|\n)*?\n\s*\1\b/,
- returnBegin: true,
- contains: [
- { begin: /<<[-~]?'?/ },
- { begin: /\w+/,
- endSameAsBegin: true,
- contains: [hljs.BACKSLASH_ESCAPE, SUBST],
- }
- ]
+ begin: /<<[-~]?'?(\w+)/,
+ end: /(\w+)\b/,
+ onBegin: function(m, state) { state.heredoc = m[1] },
+ onEnd: function(m, state) { return state.heredoc === m[1] },
+ contains: [hljs.BACKSLASH_ESCAPE, SUBST],
}
]
}; Though now I'm wondering if instead of more keys our keys should get more complex: { // heredocs
begin: {
re: /<<[-~]?'?(\w+)/,
callback: function(m, state) { state.heredoc = m[1] },
},
// ...
end: hljs.withCallback(/(\w+)\b/, function(m, state) { return state.heredoc === m[1] });
contains: [hljs.BACKSLASH_ESCAPE, SUBST],
} |
3a5d6ea
to
e087bd3
Compare
Very open to thoughts on naming. The PR isn't quite right, we'd need lots of thought and additional regarding how this works with And already you have some good questions about nesting and when both end rules are the same (other than the callback)... if the first callback returns "false" is that a blocker, or does the chain of possible ends continue to be inspected moving up the parents until it finds one that might hit? These same type of questions would likely apply to the I MUCH prefer a generic solution here though because I very much dislike the idea of adding specific solutions to the parser each time we come across some weird language nuance. For example that's what we did with @egor-rogov Any high-level thoughts? Not looking for code critiques at this point but more of the idea conceptually... of adding callbacks to allow for this type of thing, etc... The callback itself The callback being proposed takes a normal [almost] regex match object (so it can look at sub matches, etc)... the second variable starts as an empty object and can be used by rules to preserve state between the begin and end rules, such as saving the magic "token" in this case. |
Also note this here is really almost EXACTLY the same problem as "endSameAsBegin" but because the solution was hard-coded it can't be applied to this use case... just another point in why I dislike one off grammar "hack" rules vs more generic solutions that can be used more widely. |
Absolutely. I'd gladly sacrifice
I like the idea! Callbacks look promising. |
The idea is to build out a flexible plugin/event system. I feel like this comment you're making the same mistake as when |
Yes, this makes sense.
I understand the intent, and I'm in no way against it. Just wanted to imagine a situation in which onBegin/onEnd can be useful. |
My latest thoughts on this include a response object: {
begin: /(?:u8?|U|L)?R"([^()\\ ]{0,16})\(/,
end: /\)([^()\\ ]{0,16})"/,
onBegin: function(m, state, resp) { state.heredoc = m[1] },
onEnd: function(m, state, resp) { if (state.heredoc === m[1]) resp.ignoreMatch() },
} It's possible you could have
I think I could use help with the naming. Examples of both:
And Meanwhile an "ignore" match would ignore class (because of the I'm thinking perhaps we only should expose the ignore behavior, but still I could use some better names just for internal use and talking about these things. Abort is very strange IMHO because typically rules that FAIL to match do NOT eat content... but the prior behavior of beginKeyword (and maybe future behavior) means this is something we have to think about or at least name conceptually. There is also the word |
c469181
to
52beaf2
Compare
Example of usage: // in Ruby
Object.assign({
begin: /(\w+)/, end: /(\w+)/,
contains: [hljs.BACKSLASH_ESCAPE, SUBST],
}, hljs.END_FIRST_MATCH_SAME_AS_BEGIN)
// or perhaps
hljs.END_SAME_AS_BEGIN({
begin: /(\w+)/, end: /(\w+)/,
contains: [hljs.BACKSLASH_ESCAPE, SUBST],
});
// in C++
Object.assign({
begin: /(?:u8?|U|L)?R"([^()\\ ]{0,16})\(/,
end: /\)([^()\\ ]{0,16})"/,
}, hljs.END_FIRST_MATCH_SAME_AS_BEGIN) And the "logic" can then simply become a shared mode snippet: export const END_FIRST_MATCH_SAME_AS_BEGIN = {
'after:begin': (m, resp) => { resp.data.heredoc = m[1]; },
'before:end': (m, resp) => { if (resp.data.heredoc !== m[1]) resp.ignoreMatch(); }
}; Not quite as pretty as I'm open to better naming than |
I really like it. Very flexible and concise. |
That would be the next step. Though I'm not sure what that would mean... that we could rip it out in say v11? |
export const END_SAME_AS_BEGIN = function(mode) {
return Object.assign(mode,
{
'after:begin': (m, resp) => { resp.data._beginMatch = m[1]; },
'before:end': (m, resp) => { if (resp.data._beginMatch !== m[1]) resp.ignoreMatch() }
});
}; Now I'm wondering if defaulting to the first match group is assuming too much? Hmmm... |
d116781
to
0a410f9
Compare
@egor-rogov Check out the docs. Now that I had to document them I'm thinking perhaps I'm not sure we ever need such granularity as I guess I could see some crazy data transforms if we allowed that, but we could do the same thing by adding more flexibility to response... ie Does the |
Just so.
Yes, I have the same feeling that Maybe that |
Are all the other MODE helpers documented anywhere? It'd probably be good if they were, but if you document one you have to document all... |
They are not, but END_SAME_AS_BEGIN is the first one that uses plugins and it is a good example. Anyway, that's a separate story. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sure that I've already approved this, but it turns out that no...
It uses callbacks, not plugins. :-) But yes separate story. |
Oh, right... 🤦 There are no callback recipes yet (: |
85eaf3b
to
3f370b0
Compare
3dc9f1b
to
b99f275
Compare
PR highlightjs#1897 switched C++ raw strings to use backreferences, however this breaks souce files where raw strings are truncated. Like comments, it would be preferable to highlight them. - Add `on:begin` and `on:end` to allow more granular matching when then end match is dynamic and based on a part of the begin match - This deprecates the `endSameAsBegin` attribute. That attribute was a very specific way to solve this problem, but now we have a much more general solution in these added callbacks. Also related: highlightjs#2259. Co-authored-by: Josh Goebel <[email protected]>
Adds a mode helper to replace the deprecated `endSameAsBegin` attribute. The first match group from the begin regex will be compared to the first match group from the end regex and the end regex will only match if both strings are identical. Note this is more advanced functionality than before since now you can match a larger selection of text yet only use a small portion of it for the actual "end must match begin" portion.
- even if that existing behavior is questionable - the ending span should really close before the $$, not after Fixing this would involve delving into the sublanguage behavior and I'm not sure we have time to tackle that right this moment.
- I can never find this file because it's name didn't fully match. - rename callbacks to `on:begin` and `on:end`
b99f275
to
9ada394
Compare
@davidben Sorry for the delay on this one - and it's not exactly what you originally submitted - but I still gave you the byline for the bug fix. :-) Merging now. |
@yyyc514, thoughts?
PR #1897 switched C++ raw strings to use backreferences, however this breaks souce files where raw strings are truncated. Like comments, it would be preferable to highlight them.
Instead, go back to using separate begin and end regexps, but introduce an
endFilter
feature to filter out false positive matches. This internally works similarly toendSameAsBegin
.See also issue #2259.
(Also, I have to say, this C++ raw string syntax is a little absurd...)