-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax highlighting is broken for some languages #6668
Comments
Here are a couple others which I'm not able to trace to a specific instance of class Foo<A, B> where A: Identifiable, B: Identifiable {
var x: A
}
macro IntroduceArbitraryPeers<T>(_: T) = #externalMacro(...)
macro IntroduceArbitraryPeers<T>(_: T) = #externalMacro(...) (...)
macro IntroduceArbitraryPeers<T>(_: T) = #externalMacro(...)
macro IntroduceArbitraryPeers<T>(_: T) = #externalMacro(...)
let SE0168 = """ illegal
foo (this part should not be illegal)
illegal"""
|
@lildude Just an FYI, It might be that the location of the last match normally dereferenced by |
@Kaspik yes, it was the PR that changed the Swift grammar which first brought this to light. |
I can also report a recent issue in C++ highlighting: textmate/c.tmbundle#79 One-line comments now cause the next token to be ignored, which can cause catastrophic failures: // Here is a one-line comment
/*
* This is a multi-line comment that's
* broken because the opening token was missed.
* Now because of the apostrophe, everything
* looks like a character string.
*/
int main(); versus /*
* This is a multi-line comment that's
* not broken.
*/
int main(); The former currently renders as: EDIT: it does look like textmate C.plist is also using |
Also broken for CMake: # This is a comment
if(${higlighting} STREQUAL "it works")
endif() versus # This is a comment
if(${higlighting} STREQUAL "it works")
endif() |
Thanks, everyone, for your reports in this thread, and for your patience with these regressions. The issue is indeed the handling of |
Uh… what languages, exactly? How many? I'm interested in how many of my TextMate grammars might be substituted for a lower-quality or buggier tree-sitter version… |
FWIW, VS Code seems to support these patterns. Perhaps if a new regex/scoping implementation is needed then that team could help with integrating theirs, which seems to be actively maintained: https://github.com/microsoft/vscode-textmate
And unfortunately, it’s probably many of the highest-quality TM grammars that will need to be replaced, if these TM patterns are no longer supported 🙁 |
Also, I forgot to ask: why is there an outdated fork of PCRE being used instead of, y'know, something contemporary? This isn't a fault with the TextMate grammar format itself, but the regex engine that powers its pattern-matching. Portable grammars (or at least those that I've written) can leverage a subset of regex features that's supported by both modern PCRE, Perl, and Oniguruma. At no point should a fork be needed… |
@Alhadis We have, for some time, been using tree-sitter for C, C#, CSS, Elixir, Gleam, Go, HTML, Java, JavaScript, JSDoc, Nix, PHP, Python, CodeQL, regular expressions, Ruby, Rust, TLA+, and TypeScript. This week I’m hoping to get to C++, .gitconfig and .gitmodules, XML, and R. We were using an old PCRE fork because, as far as I can tell, neither PCRE1 nor PCRE2 provides equivalents of As to the technical reasons behind this change, tree-sitter grammars are simply a better fit for the GitHub highlighting service. TextMate grammars made sense for their original use-case, but for the purpose of our highlighting service (a non-interactive, stateless RPC server, very different from any text editor), regular expressions are not competitive, speed-wise, with tree-sitter’s generated C, and speed is one of our most important desiderata, given the scale at which this service has to operate. The speed increase when switching from a TM to TS grammar is noticeable from an end-user perspective. However, as I mentioned above, we’re looking into how we can add back support for @jtbandes If you see regressions, we’d be grateful if you’d report them so we can improve the whole tree-sitter ecosystem! VS Code is actually a bit of an exception w/r/t its continued use of TextMate grammars; neovim, Pulsar, Zed, and now Emacs are all using tree-sitter for highlighting. tree-sitter is very much the lingua franca at GitHub for anything parsing-related, and that’s unlikely to change. For what it’s worth, we’ve been very happy with the capabilities of tree-sitter for syntactically rich languages like Ruby (and these syntax highlighting queries are much more concise than the equivalent .tmlanguage JSON files; Ruby’s is only a few hundred SLOC). Tree-sitter grammars also compose well, which is great for languages like TSX where you have multiple grammars/highlighters that need to work in concert. |
Thanks for the update on your continued work on this @patrickt. |
@patrickt Is the Swift stuff using https://github.com/alex-pinkus/tree-sitter-swift or a different grammar? Let me know if any help with that migration is needed — while I have not worked on the tree-sitter side (yet!), I have intimate knowledge of the TM grammar and recently updated it with support for highlighting regex literals, as well as some other newer language features which I believe the tree-sitter grammar currently does not support. If I can be of assistance, drop me an email or DM. |
@patrickt Hypothetically, if one were to write a proof-of-concept that demonstrated how TextMate grammars could be used to provide a structured parser grammar based on a combination of their regex patterns and properties associated with scopes that capture groups are tagged with… would that be something of interest to GitHub? |
@jtbandes Thank you for your help! We’re using the @Alhadis I think that sounds extremely cool, and I’d like to see your results, but if by “of interest to GitHub” you mean “sufficient for us to prefer TextMate grammars over tree-sitter grammars,” then I’m afraid that the answer is no. If there are missing syntax-highlighting features or developer amenities in tree-sitter as compared to what TextMate provides, then that’s a bug in tree-sitter that we'd very much like to fix. |
No, I mean “sufficient enough to disincentivise mass-migration of existing TextMate grammars to potentially inferior tree-sitter versions”.
It's not a bug in tree-sitter. It's the very design of the system itself that feels over-engineered and extremely hostile to grammar developers. I don't doubt that tree-sitter is performant and well-optimised to GitHub's needs, but TextMate's simplicity and flexibility are its key strengths to a grammar author. Moreover, it's not bound by the restrictions of a formal parser grammar, and it lends itself well to informal, unstructured formats like logs and program output. |
Be cool man, let's leave this thread to report issues with the grammar. Thanks @patrickt for the work you put in to syntax highlighting and for the detailed update! It would've been very easy to leave us in the dark so your posts are much appreciated. |
I’m afraid not, no. I appreciate your concerns and sympathize with your frustration, but this decision is set in stone at this point. The set of trade-offs with which we are working render us willing to accept temporary regressions (all of which, in my experience, have been trivially fixable) for the several axes on which tree-sitter provides concrete improvements. |
Hey, everyone 👋 Thanks to some absolutely heroic work on the part of @dcreager, we’ve addressed the regressions people have been observing. Please let us know if you spot any further infelicities, and thank you all very much for your patience while we fixed these issues. |
@patrickt I was literally about to point out how the regression appears to be fixed. Thanks for all your hard work! |
Thanks for your hard work. I’m sad to see Swift.tmLanguage go since it’s my baby, but I guess this will put more pressure on tree-sitter-swift to support all the language features. (Just look at all these things that are no longer highlighted: https://github.com/jtbandes/swift-tmlanguage/blob/main/grammar-test.swift) Maybe I will get around to contributing to it some day. |
@patrickt As of now, still seeing some issues with Swift files (like 'lazy', or 'dataSource' / 'delegate'), but overall seems much better! (or current PR just has no specific code that was problematic 😄 ) |
@Kaspik Those issues would be due to the switch from Swift.tmLanguage to tree-sitter-swift. |
Interestingly, I'm still seeing the Swift tmLanguage grammar being used in some places. It looks like it's used for diffs in Markdown files but not Swift files: https://gist.github.com/jtbandes/34a7a9fcc972f1d4d3f3ca6a39b86ccd/revisions |
@jtbandes Keen eye regarding Markdown—yes, the fact that embedded Markdown goes through the legacy TextMate path is an implementation detail we hope to fix someday (right now Markdown itself does not go through tree-sitter, which would be a first step, but after that it also requires some thought re. implementation). Thank you also for the Swift grammar test file you’ve written—I hadn’t seen this before, and this will greatly assist getting |
I think having Markdown done through old implementation is actually a good thing because you have direct comparison what's needed and left to be improved in the implementation. The Markdown code above is much easier to read than the Swift file. I wouldn't migrate Markdown till the Swift one fixed. |
@lildude I think this can be closed absent further reports of TM grammars breaking. |
Thanks @patrickt. |
Edit: seems like this is fixed now Hey folks, it looks like single-line comments in C++ are broken: https://github.com/jackma/mcap_mqtt_recorder/blob/main/main.cpp#L26 Although it does work properly in a markdown code snippet and gist: // Return the stamp property if set, otherwise use the current time
mcap::Timestamp GetTimestampFromProperties(
mqtt::properties properties, const std::string& stamp_property) https://gist.github.com/jtbandes/6b632b45e4a88ce178d7c9b7b1810b91 |
Hi 👋 Python code blocks in rst are now 100% underline: https://github.com/piskvorky/smart_open/blob/v7.0.4/README.rst Is it a remnant of this issue, or should I create a new one? |
This is a whole new issue and nothing to do with Linguist or the original issue. github/markup is the repo responsible for rendering codeblocks in rst files so an issue needs to be logged there or directly with GitHub support. |
Locking this issue as all of the issues originally reported have been resolved. Any new issues need to be logged with the upstream grammar repo as normal. |
Opening this to have a referable issue.
As @Alhadis reported there is an issue with separate syntax highlighting engine that is causing syntax highlighting issues…
This is not an issue with linguist and not likely to be an issue with the grammars we bundle.
An issue has been opened with the team that manages the highlighting engine and I'll keep this issue updated as I hear more.
The text was updated successfully, but these errors were encountered: