Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove scanner, rewrite grammar.js #16

Merged
merged 3 commits into from
Sep 28, 2022
Merged

remove scanner, rewrite grammar.js #16

merged 3 commits into from
Sep 28, 2022

Conversation

justinmk
Copy link
Member

@justinmk justinmk commented Sep 16, 2022

fix #7
fix #9
fix #10
fix #11
fix #14
fix #12 (non-nested)
fix #13 (non-nested)
fix #17


================================================================================
Headline with tag at the start
headline with tag at the start
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use lowercase descriptions so that tree-sitter test --filter headline is easy to use.

@justinmk
Copy link
Member Author

Incompatible language version 14. Expected minimum 13, maximum 13

@vigoux any idea how I can fix this? Is CI installing an older version of tree-sitter CLI?

@justinmk justinmk marked this pull request as ready for review September 19, 2022 00:44
@justinmk justinmk changed the title WIP: remove scanner, rewrite parser in grammar.js remove scanner, rewrite grammar.js Sep 19, 2022
@justinmk
Copy link
Member Author

code_block is kinda working but still needs work. Other than that, all tests are passing (plus several new tests).

@vigoux
Copy link
Member

vigoux commented Sep 19, 2022

It is pretty surprising... I'll look into this problem today.

@vigoux
Copy link
Member

vigoux commented Sep 19, 2022

I pushed a fix for that in latest master.

To ensure consistency of the parser version, I recommend you to:

  • Run npm install
  • Run make

This should handle all the version management for us now.

@clason
Copy link
Member

clason commented Sep 19, 2022

Yes, looks like CI installs an old parser (0.20.0)?

As ABI 14 is now (0.20.7) the default, it would be better to update the CI.

@vigoux
Copy link
Member

vigoux commented Sep 19, 2022

The CI uses the locked version of tree-sitter.

In latest master the locked version now supports ABI 14, and furthermore, the default ABI for this parser should now be 14.

@clason
Copy link
Member

clason commented Sep 19, 2022

Any interest in including queries in this repo, too, so CI can catch breaking changes that require query updates on PRs?

(Does this PR require query updates?)

@vigoux
Copy link
Member

vigoux commented Sep 19, 2022

Would be good too.
I'll have to look up the names of the captures for markup languages.

@clason
Copy link
Member

clason commented Sep 19, 2022

You can start with https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/help/highlights.scm
(sans the quote conceal).

We don't need to follow the strict corset of nvim-treesitter, but it would help; the allowed text captures are here: https://github.com/nvim-treesitter/nvim-treesitter/blob/master/CONTRIBUTING.md#text

@justinmk
Copy link
Member Author

justinmk commented Sep 19, 2022

(Does this PR require query updates?)

yeah, mostly the introduction of:

  • block (which contains line).
  • url
  • keycode (TODO)
  • special (TODO)

I will probably alias (line_noeol) => (line) and (block_end) => (block) since there isn't much reason to expose those.

For "inner content" I have named the field either "name" or "text", to add some consistency.

If we're open to renaming things, I suggest:

  • hotlink => taglink
  • option => optionlink
  • headline with "===" => h1
  • headline with "---" => h2
  • UPPERCASE WORDS => h3
  • column_heading unchanged, it is special and not really a "heading"
  • backtick => codespan
  • code_block => codeblock

README.md Outdated Show resolved Hide resolved
grammar.js Outdated Show resolved Hide resolved
@justinmk
Copy link
Member Author

code_block is working. Only 2 failing tests remaining for "option". Then will add keycode and keychord, then this is ready I think.

@clason
Copy link
Member

clason commented Sep 19, 2022

If we're open to renaming things, I suggest:

We are. If you add the new queries here, I'll just plonk them into nvim-treesitter when bumping this parser. (Makes my life much easier.)

(The parser is currently marked as "experimental", so I have no qualms about breaking things. Could think about removing that label, though, if you deem things stable enough after this PR.)

@clason

This comment was marked as off-topic.

@vigoux
Copy link
Member

vigoux commented Sep 20, 2022

Without changing the "spec" for help files, I think that this cannot be handled in the parser.

The parser user would have to guess the filetype of the codeblock by looking at what it contains otherwise.

@justinmk
Copy link
Member Author

any chance the new code_block can specify the language (either Lua or Vimscript

Isn't that something that can be added later? Could also extend the vimdoc spec as @vigoux mentioned, e.g. vimscript> and lua>.

@clason

This comment was marked as off-topic.

@vigoux
Copy link
Member

vigoux commented Sep 20, 2022

By the way, I am okay with renaming things. I think taking a fresh start is great.

@clason

This comment was marked as off-topic.

@vigoux
Copy link
Member

vigoux commented Sep 23, 2022

For the record, this seems to fail parsing some options.

@justinmk
Copy link
Member Author

For the record, this seems to fail parsing some options.

Yes, I need to fix some stuff

@justinmk
Copy link
Member Author

justinmk commented Sep 24, 2022

options are fixed (locally, WIP), now 2 other issues to fix :D getting close...

@justinmk
Copy link
Member Author

justinmk commented Sep 24, 2022

Ok all tests are passing and I think the results are pretty good. I will note the compromises as PR comments.

I didn't rename anything yet, nor add highlights. Should that wait for a followup PR?

(line
(argument
(word)
(ERROR))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave up on this for now, ideally it should be:

  (argument
    (word)
    (MISSING "}"))

corpus/backtick.txt Outdated Show resolved Hide resolved
================================================================================

Hello `world`, I am a markup language
Hello `world`, I am `markup language`. But `this is
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar supports same-line whitespace in a codespan, but I'm not sure if that's actually supported by vimdoc format.

@justinmk justinmk marked this pull request as draft September 24, 2022 18:03
@justinmk justinmk marked this pull request as ready for review September 24, 2022 18:04
@clason
Copy link
Member

clason commented Sep 25, 2022

I didn't rename anything yet, nor add highlights. Should that wait for a followup PR?

Best to rip the bandaid off in one go.

And, yes, please add queries; it's good practice to keep them in sync, otherwise I'll have to manually fix the ones we bundle or it will block all updates in nvim-treesitter. (Can be improved later; the main point is that they don't error out with the new parser.)

@justinmk
Copy link
Member Author

justinmk commented Sep 26, 2022

Plan

  1. fix column_heading to not parse special forms like links, tags, etc. It only supports plaintext.
    • skipping this for now, consumers can just check if the parent is column_heading and then treat its children as plaintext.
  2. rename atoms
  3. ✅ add queries/highlights
  4. future PR: try to eliminate block_end and line_noeol (just for elegance)?
  5. future PR: introduce (keycode), (special) for more syntax features: keycodes, special, parameters #1

Problem:
Hand-written C scanner is hard to maintain, slow, and hangs on files
like `filetype.txt` and `usr_24.txt`.

Solution:
Delete hand-written C scanner, define grammar fully in `grammar.js`

- introduce `url`
- introduce `block`, a group of lines. (does not support nesting yet)
- introduce `line_li` for listitems. (does not support nesting yet)
- keycodes #1
- `[range]` #1

fix #1
fix #7
fix #9
fix #10
fix #11
fix #14
fix #12 (except nested)
fix #13 (except nested)
@justinmk
Copy link
Member Author

Latest push includes renamed forms. See updated comment #16 (comment) for summary.

@vigoux
Copy link
Member

vigoux commented Sep 27, 2022

You should have commit access now, feel free to merge this PR and take the parser over from here

@justinmk justinmk merged commit 2ba61cf into neovim:master Sep 28, 2022
@justinmk justinmk deleted the remove_scanner branch September 28, 2022 11:31
This was referenced Sep 28, 2022
justinmk added a commit to justinmk/neovim that referenced this pull request Sep 28, 2022
justinmk added a commit to justinmk/neovim that referenced this pull request Sep 28, 2022
justinmk added a commit to justinmk/neovim that referenced this pull request Sep 28, 2022
- adapt to parser changes from neovim/tree-sitter-vimdoc#16
- numerous other generator improvements
justinmk added a commit to justinmk/neovim that referenced this pull request Sep 28, 2022
- adapt to parser changes from neovim/tree-sitter-vimdoc#16
- numerous other generator improvements
justinmk added a commit to justinmk/neovim that referenced this pull request Sep 28, 2022
- adapt to parser changes from neovim/tree-sitter-vimdoc#16
- numerous other generator improvements
lvimuser pushed a commit to lvimuser/neovim that referenced this pull request Oct 6, 2022
- adapt to parser changes from neovim/tree-sitter-vimdoc#16
- numerous other generator improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants