-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
25,000 context limit is too low #1687
Comments
The 25,000 context sanity limit has been present since .sublime-syntax was introduced. It isn't really new. The issue, I believe, is that the default syntaxes have been changing as they have been improving, and the number of contexts has gone up. Even assuming we do increase the limit. The increase would go into the dev cycle first, and then into a stable build. This would likely mean it would be a number of months before any significant number of users received the new increased limit. For now, it would seem you'll have to do some streamlining on the syntax to prevent the issue. It seems you've identified the pattern in the syntax causing the issue. Perhaps it would be possible to use something like YAML Macros by @Thom1729 to generate the final syntax from your preferred structure. Also, to properly track this feature suggestion, it should be posted at https://github.com/SublimeTextIssues/Core/issues. There may already be an issue there for this, or a related suggestion. |
Why is |
@Thom1729 : Where does HTML syntax use @y0ssar1an : Even though the performance hit might not be too obvious on modern systems, the RAM usage is. I hate the feeling of a text editor to need 1GB of RAM to just edit a file of some KBs of size. The more complex syntaxes cause ST's RAM usage to grow significantly as all the context's need to be stored somewhere. I am with @wbond in the question of increasing that limit. I think hitting this limit shows the complexity of the syntax definitions has reached a level, we'd need to think about the whole strategy at all. While we are talking about syntax highlighting most syntaxes go far beyond that right now trying to be some kind of static compiler catching any language detail rather than concentrating on highlighting. This enables all the goto features and maybe more in the future, but comes along with some negative side effects as we see. I wonder how many meta scopes are not used by any plugin at all. Don't misunderstand me, I really like the abilities ST offers to create syntax definitions - one major reason for me to give it a try. Maybe I am wrong, but I think the scenario of such a kind of complexity and the way different syntaxes embed/include/extend each other was not part of the initial design process of the sublime-syntax tokenizer. Especially the Maybe some |
I would think both of these problems can be solved by using pointers to contexts instead of copying around the contexts. Though, one would need to implement a proper cycle detector. |
Exactly what I had in mind. |
Contexts are copied when they are modified. Trying to use pointers with with_prototype would destroy performance because N regex sets would need to be run on the text instead of 1. |
The performance concern makes sense, but in practice, wouldn't N usually be 1, less often be 2, and only be more than that if there were multiple Also, how does |
Performance is something the core devs can judge only as they know about the internals. At least one additional layer to handle the pointered regex sets would be needed. In the end the number of regex patterns being evaluated shouldn't change. It's "just" a question about, creating a copy of the whole context and add the with_prototype patterns in front of that list vs. reuse the existing context and somehow link the additional with_prototype patterns to be processed first (or vice versa). But sure .. this needs more management and might impact overall performance and may make things even more complicated to handle. The I wonder whether we could find a more efficient way to extend existing syntaxes. Especially in cases where an injected context doesn't need to be part of each underlying context. Not an easy question, which I don't have an answer for, too. Another idea to reduce the number of contexts needed in existing syntaxes was to at least make If the following pattern
could match
things would become much easier and help reduce contexts, I think. Today we need something like
to do the same thing. |
As I understand it, all regexps for a context are pre-compiled into a single expression. If
The
Even notwithstanding the lookahead issue, I've found using several contexts to be more reliable and more flexible than relying upon capture groups. |
I just saw your previous comment @Thom1729:
Well, I think it is that all regex sets would have to be run and the results of each checked and compared to see which matches soonest. Our current implementation localizes everything in a context into a single contiguous chunk of memory, so with multiple regex sets we'd likely be spreading the necessary data for the next check out over different parts of memory. It is true that in many cases N would be 1, but there are plenty of places where it wouldn't be. Ultimately I think a big factor in experimenting with something like this would be the complexity of implementation and time it would take.
Escape is run once per line, and effectively bounds the lexing to the escape point. So it has a different impact than splitting context lexing into multiple regex sets. |
That hadn't occurred to me. I can see how this could have a real performance impact even when N=2. Thanks for the insight! |
... and often faster. Thanks @wbond for clarification about what Some debugging feature to see which syntaxes or constructs are most responsible for increasing number of contexts might help. Even though many syntaxes got more complex it doesn't seem obvious to me, why and where they hit the number of 25.000 contexts. Most of them only have a few hundred. Most impact seems to be caused by including each other. |
Thank you for the enlightening discussion. It sounds like raising the limit would be a mistake. I'll look for ways to streamline CSS3 rather than contribute to increasing the memory usage. Thanks again! |
I'm the author of CSS3. Since the 25,000 context "sanity limit" was added to Sublime, my users have been hitting this error whenever they edit a file like HTML or PHP. I've only recently had the time to investigate.
25,000 seems like an insanely high number of contexts on the stack. I believe the assumption was that the only way to hit this limit was with an import loop (infinite recursion). CSS3 can hit this limit without any recursion. It is a total beast! It contains individual regexes for all 400+ CSS properties. I made this design choice because I thought it would be cool if only valid values were highlighted for every property.
In CSS3, there's a
properties
context which contains contexts for every CSS property. This is the context that exceeds the 25,000 limit. As a mitigation, I've created aproperties-minimal
context which removes ~100 lesser-used CSS properties. These will not be highlighted when you're editing CSS inside a<style>
tag in an HTML document, for example.Even with
properties-minimal
, my users are still hitting the limit if they have lots of 3rd party packages installed.I'm guessing the 25,000 number was chosen arbitrarily. Despite the admittedly crazy number of contexts that CSS3 loads, my users have never reported any performance problems. That's a testament to how fast Sublime is.
I'd like to ask about the possibility of raising the limit to 35,000 or higher. Based on the lack of performance issues with CSS3 before the limit was added, I suspect it could be raised without causing any problems. What do you think?
This would resolve #1665.
The text was updated successfully, but these errors were encountered: