Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve on font lock keywords naming scheme? #641

Closed
vspinu opened this issue Aug 24, 2018 · 20 comments
Closed

How to improve on font lock keywords naming scheme? #641

vspinu opened this issue Aug 24, 2018 · 20 comments

Comments

@vspinu
Copy link
Member

vspinu commented Aug 24, 2018

Follow up on #538 and #621 which I intentionally haven't re-read in order to have a more independent prospective.

Currently the font-lock menu looks like this:

font-lock-submenu

To my mind the four new *-keywrods take a lot of space, bring little functionality and are simply confusing. By eyeballing the sub-menu it's not clear what keywords and bare-keywords are and what's the relationship between them, it's somewhat clear what control-flow-keywords are and signal-keywords just sound strange.

When you navigate to the help page of ess-R-fl-keyword:bare-keywords in hope to see the actual keywords you get this:

Documentation:
Font-lock keywords that do not precede an opening parenthesis.

Value:
(eval let
      ((bare-keywords
        (delq nil
              (mapcar
               (lambda
                 (keyword)
                 (car
                  (member keyword ess-r--bare-keywords)))
               ess-R-keywords))))
      (cons
       (regexp-opt bare-keywords 'words)
       'ess-keyword-face))

Now if you go and figure out how those bare-keywords and plain keywords are actually generated then it turns out that there is no custom variable for any of those and in fact ess-R-keywords does not map to ess-fl-keywrods:keywords.

Finally, the keywords behave plain funny. For example If you disable keywords then if is no longer highlighted but else is. This is a consequence of the split based on whether they precede open parenthesis or not. I know we discussed this and I agreed, but boy, this stupefies me right now. Why should users care about open parenthesis? I do see if and else as part of the same group and I do think most of the users would agree.

I don't think the above situation is satisfactory. It's a pretty obvious example of an over-engineering and I think we should simplify it. At least we should reduce those to 3 groups and map them one on one to some custom variables to which we could redirect users in the documentation.

@vspinu
Copy link
Member Author

vspinu commented Aug 24, 2018

And why if and else are not part of control flow group?

@jabranham
Copy link
Contributor

jabranham commented Aug 24, 2018

I agree the current situation is a little crazy :-)

I think most other major modes just present "levels" of fontification. This would let users set it the way they can for other major modes using something like (setq font-lock-maximum-decoration '(ess-mode . 2)) This would be pretty simple to implement, as you just pass a list to the first element of font-lock-defaults when building the major mode.

The current conventions for levels (taken from the Elisp manual) are:

   Here are the conventions for how to define the levels of
fontification:

   • Level 1: highlight function declarations, file directives (such as
     include or import directives), strings and comments.  The idea is
     speed, so only the most important and top-level components are
     fontified.

   • Level 2: in addition to level 1, highlight all language keywords,
     including type names that act like keywords, as well as named
     constant values.  The idea is that all keywords (either syntactic
     or semantic) should be fontified appropriately.

   • Level 3: in addition to level 2, highlight the symbols being
     defined in function and variable declarations, and all builtin
     function names, wherever they appear.

@vspinu
Copy link
Member Author

vspinu commented Aug 24, 2018

I think most other major modes just present "levels" of fontification.

This is actually how it was before and it's a pretty awful idea. Emacs maintainers wanted to change that for long. They just couldn't figure out a better solution. When I proposed the solution which we had in ESS about 4-5 years ago, Stefan Monier even said it was the way to go.

@vspinu
Copy link
Member Author

vspinu commented Aug 24, 2018

I would propose just two keywords groups. One ess-R-fl-keyword:message which would include ("message" "warning" "signalCondition" "withCallingHandlers") and all other keywords in in the old ess-R-fl-kwyeord:keywords. I think it's visually equivalent to what we have right now and much much simpler.

@lionel-
Copy link
Member

lionel- commented Aug 24, 2018

I think you consider the main UI to be the fl variables with regexps while I consider the main UI to be the keywords variables. I think the latter makes more sense because regexps are an implementation detail.

To illustrate, ess-R-fl-keyword:bare-keywords was a private variable in my mind. If you look at the keywords variables there is no ess-R-bare-keywords, only `ess-R-keywords because the notion of bare keywords is not user-facing.

Why should users care about open parenthesis?

They shouldn't which is why I consider the regexps to be an implementation detail.

Whatever solution we end up with: Keywords like if without parentheses shouldn't be fontified as keywords (especially in console output). Same thing for non-reserved keywords likeswitch or message which are common parameter names: list(message = "foo") shouldn't fontify message.

@lionel-
Copy link
Member

lionel- commented Aug 24, 2018

Whatever solution we end up with

And second thing, it should be easy to add and remove keywords.

@lionel-
Copy link
Member

lionel- commented Aug 24, 2018

Maybe with one more level of indirection (or some refactoring) we can have the main configuration variable take a list of symbols:

(setq ess-r-font-lock-groups '(keywords signal fun-defs))

This variable would enable or disable groups as a whole. In addition you can modify the -keywords variables to add or remove specific keywords:

(setq ess-R-signal-keywords (add-to-list 'ess-R-signal-keywords "my_function"))

Not all groups have corresponding keywords, for example fun-defs wouldn't have any. When a user wants to use their own regexps, they'd first disable the font-lock groups overlapping with their regexp and then use the usual font-lock configuration UI.

With this UI the regexps variables would be private. Which makes sense because we might want to use a different strategy (e.g. a syntactic function) to fontify some groups.

Finally, if you'd like to simplify things, maybe we should merge the modifiers and signal keywords groups since they use the same face (more precisely signal inherits from modifiers)?

@vspinu
Copy link
Member Author

vspinu commented Aug 25, 2018

Why should users care about open parenthesis?

They shouldn't which is why I consider the regexps to be an implementation detail.

Ok. Then I guess we agree that for the purpose of the UI we need to merge the regexps.

Question. Outside of the font-lock do you use or plan to use any of those variables?

Maybe with one more level of indirection (or some refactoring) we can have the main configuration variable take a list of symbols:

We already have two levels of indirection. Those entries in the menu should be groups and that's easy to achieve in the current architecture.

While regexp might seem like an implementation detail font-lock-keywords are part of the UI. Pretty much every emacs user knows about those and that each entry in that list does highlight something. So it's a natural choice for a checkbox menu like we have. Any extra layer on top would be a complication.

Finally, if you'd like to simplify things, maybe we should merge the modifiers and signal keywords groups since they use the same face (more precisely signal inherits from modifiers)?

This might be a good idea. Two things that bother me. First, modifiers is not suggestive. Why library, require or message are modifiers? Second, the face which we use to highlight modifiers (font-lock-constant-face) is not emphatic enough in most themes. I am used to message and warning being bold as stop is. Because stop is similar to message and warning and in other languages is colored the same we should preserve the emphasis. So maybe we inherit from built-in-face for modifiers? In most of the themes I have tried built-in and keyword have same weight.

To wrap up, would it be ok for the purpose of the UI to merge all keywords into fl-keyword:keyword group and current signal-keywords into modifiers?

@lionel-
Copy link
Member

lionel- commented Aug 25, 2018

I really think this is a desirable UI for all ESS users:

(setq ess-r-font-lock-groups '(keywords signal fun-defs))

With a corresponding menu entry. Those intelligible symbols are much better than what we have now (which is not hard, I agree the state of this menu is not good) or what we had before. The documentation would point users to the corresponding keyword variables that they can modify for finer-grained control.

font-lock-keywords are part of the UI. Pretty much every emacs user knows about those

Agreed, but then they can construct their own font-lock variables, and it's easy to remove those of ESS that interfere with it by removing the relevant symbol from the ESS font-lock groups. It makes sense to keep our regexp variables private. We should be free to split up regexps, switch to a different font-lock data structures, remove the regexps because we fontify with the syntactic function, etc.

@vspinu
Copy link
Member Author

vspinu commented Aug 25, 2018

Those intelligible symbols are much better than what we have now

It's not different from what we have right now (besides being more complex). If some names are not intelligible enough they should be changed. We have ess-R-font-lock-keywords which user can configure by checking some checkboxes either in the menu or custom interface. The big benefit is that they map on font-lock-keywrods and that's something which I don't want to abandon for the sake of simplicity.

In any case, such a complex change is out of question for the next release.We can discuss it on some other occasion.

t makes sense to keep our regexp variables private. We should be free to split up regexps, switch to a different font-lock data structures, remove the regexps because we fontify with the syntactic function, etc.

Ok. I think we agree entirely here. We can construct those regexes as we like. It can even be a function. As long as the doc says exactly what that entry in font-lock-keywrods does and which variables should be customized everyone should be happy.

I will make a few changes and will propose a PR on this so that we can discuss in more concrete terms.

@lionel-
Copy link
Member

lionel- commented Aug 25, 2018

It's not different from what we have right now

The symbol list is made for users while the font-lock regexps list is made for font-lock. I'd rather have the ESS UI be made for users without details like font-lock regexps leaking in it. It can't be that complicated to implement, it's a simple mapping.

Anyway:

  • Let's merge the regexp variables and maybe simplify eval forms by calling functions instead of inlining the logic.

  • Move message and warning with stop as it was before if you think that's better for users and not just because of habit. It's now easy to modify this on the user side, but we should have the right default. Before making that change I'd like to hear what @mmaechler, @jabranham and any user reading this and having tested the new scheme thinks?

  • I think we should name the keyword variables after the Emacs face they inherit from. How about ess-r-keyword-words, ess-r-builtin-words, ess-r-constant-words etc? Then it's easy to know what variable should be modified.

@vspinu
Copy link
Member Author

vspinu commented Aug 25, 2018

It can't be that complicated to implement, it's a simple mapping.

It's not complicated to implement, it's complicated to reason about because of the 3 levels of indirection and the benefits are marginal to the check list what we already have. I do see "regexp leaking" as a plus. People with marginal lisp skills but knowledge of regexp should be able to see what's going on. That's definitely not a good reason to change the UI.

Let's merge the regexp variables and maybe simplify eval forms by calling functions instead of inlining the logic.

I will do that that for :keywords only. Not much point in flooding emacs with trivial functions just to hide one eval.

Move message and warning with stop as it was before if you think that's better for users and not just because of habit.

That basically means reverting back to old ways. I think the main point of the original initiative was to have message and warning separate from stop and other "interrupts".

Just to be clear, I am ok with the split but I am not ok with using fonts with different weight for those two groups. I think either we have to make message and warning of font-lock-built-in-face or we should inherit from ess-function-call-face but with bold weight.

think we should name the keyword variables after the Emacs face they inherit from.

It's hard to map those one on one. We use constant face for a bunch of unrelated things (operators, paren, modfiers etc.). For essential things the correspondence is there keywords -> keyword-face constants -> constant-face. And, frankly, emacs constructs should have little to say about R syntactic constructs in the first place.

@lionel-
Copy link
Member

lionel- commented Aug 25, 2018

it's complicated to reason about because of the 3 levels of indirection

The current scheme can surely be refactored in a way that makes sense, should we change the UI. It isn't a big deal either way, we should talk about what UI is desirable.

I do see "regexp leaking" as a plus. People with marginal lisp skills but knowledge of regexp should be able to see what's going on.

It's fine if people want to examine implementation details, but I think you underestimate how daunting font-lock is for most users. Anyway I guess we'll keep the statu quo since you feel strongly about it and we can talk about this again if we ever change the implementation strategy for a particular construct. Then the regexp list will get in the way because it has not been abstracted away. Like the ess-R-fl-keyword:%op% that mysteriously (from the user point of view) disappeared from that list some releases ago just because we changed the implementation to a non-regexp approach.

we should inherit from ess-function-call-face but with bold weight.

I like that idea.

It's hard to map those one on one. We use constant face for a bunch of unrelated things (operators, paren, modfiers etc.).

Right but operators wouldn't have a -words variable. These variables would be for simple keywords and function names only.

And, frankly, emacs constructs should have little to say about R syntactic constructs in the first place.

Yeah but merging the modifiers and signal keywords doesn't make much sense in terms of R semantics hence the idea of just using the face as descriptor, at least it makes sense in terms of the user theme.

@vspinu
Copy link
Member Author

vspinu commented Aug 25, 2018

we'll keep the statu quo since you feel strongly about it and we can talk about this again if we ever change the implementation strategy for a particular construct.

Yes. We surely cannot change it before the release for obvious reason.

I feel strongly against the complexities of the new UI which I think you under-estimate. Just think that each group (keywords, fn-def, ops etc) would need to be mirrored. So you end up with 3 parallel set of variables (1) raw list of keyword (like ess-R-modifiers) (2) font-lock keywords (like ess-R-fl-keyword: modifiers) and (3) the symbol modifiers in the grouping construct. That's just too complex for no or marginal benefit.

Then the regexp list will get in the way because it has not been abstracted away. Like the ess-R-fl-keyword:%op% that mysteriously (from the user point of view) disappeared from that list some releases ago just because we changed the implementation to a non-regexp approach.

That's an orthogonal problem which is fixed now. The core reason is that if we change a default value of a defcustom it's not propagated to the user config. Even if we would have another UI we would still have to take care how to merge users's config with the default configuration which we could change.

@lionel-
Copy link
Member

lionel- commented Aug 25, 2018

Even if we would have another UI we would still have to take care how to merge users's config with the default configuration which we could change.

I was talking about something else. Right now users no longer have the possibility of disabling that particular fontification like they used to (which I don't think is a big deal, but it illustrates the point). With my proposal the configuration would stay the same no matter the implementation, e.g. '%op% in that case. Or we could fold it in with the other operators, then 'operators would represent a mix of regexps and syntax-propertize effects.

I feel strongly against the complexities of the new UI which I think you under-estimate.

Maybe. If I have time I'll have a go at a proof of concept, then at least we can talk over concrete code?

@vspinu vspinu closed this as completed Aug 28, 2018
@lionel-
Copy link
Member

lionel- commented Aug 28, 2018

@vspinu Could the font-lock menu be made to look more like this?

[x] Keywords: Control flow functions and reserved words
[x] Modifiers: Functions with side effects like library()
[x] Constants: Reserved words for constant values like NULL
[ ] Assignment operators
[ ] Other operators

And so on.

@vspinu
Copy link
Member Author

vspinu commented Aug 28, 2018

Could be but we would need names for each of those variable, either within the same data structure or outside. Both of the above would mean a slightly more convenient menu but quite inconvenient UI for those who want to customize in .emacs. It would also mean complicating the defcustom and thus tracking multiple versions of the same name etc.

But ... if one is smart enough then that informative string could be picked from the first line of the docstring of the font-lock-keyword ;) Both for the menu and for the defcustom tag.

In any case we probably shouldn't hide the name of the variable for the sake of those (many) people who customize in .emacs and those developers who want to jump to the definition of the keyword. So it should look like Reserved words for sontant values like NULL (ess-R-fl-keyword:constant).

@lionel-
Copy link
Member

lionel- commented Aug 29, 2018

I'm all for a convenient .emacs UI as I almost never use customize or the menus. That's why I've been arguing for a slightly different approach than what we have now. The complex-looking symbols (11 of them), the invitation to read their contents and docstrings (irrelevant complex regexps) and the cons cells are needlessly getting in the way. I know you're attached to the current UI so I'll look into improving things based on it.

@vspinu
Copy link
Member Author

vspinu commented Aug 29, 2018

I am not attached, I don't see an uniformly better solution. I am pretty sure whatever one could do here is a marginal improvement. The number of 11 groups cannot be reduced, that's pretty clear. Cons cells could be reduced by making a list that would include only font-lock keywords which are needed (this is marginal). The names could be changed but that would mean two parallel symbols with similar meaning, so it's out of consideration as well.

Finally, any new system should be compatible to whatever we have right now. People have customized it over the years and it should work further on. I am all eyes for a different solution. You can try giving it a stab if the current situation really bothers you so much and then we could talk in a more concrete terms.

@lionel-
Copy link
Member

lionel- commented Aug 29, 2018

The number of 11 groups cannot be reduced, that's pretty clear.

Right, I just meant the number of configuration variables exposed to the user, so that it's not so overwhelming. This can be reduced from 1 + 11 + 3 to 1 + 3. Sorry to insist, just want to be clear.

You're right compatibility might be tricky and not worth changes. I'll start by making the syntax-propertize stuff configurable again by inspecting the current font-lock regexps variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants