Rewrite RubyLex to fix some bugs and make it possible to add new features easily #500

tompng · 2023-01-09T13:40:04Z

Description

Fixes #499
In RubyLex, there are several parser-like methods listed below. To fix bugs, almost all of these methods needs change.

process_nesting_level
check_corresponding_token_depth
check_newline_depth_difference
check_string_literal
is_the_in_correspond_to_a_for
take_corresponding_syntax_to_kw_do
in_keyword_case_scope?
is_method_calling?
heredoc_scope?

I combined them all into a single parser that calculates open tokens for each line.
Duplicated codes are reduced.

What I changed

Rewrite nesting parser

Create IRB::NestingParser and use it from several methods.

# Example of open tokens for each line
if true # open tokens: ['if']
  puts( # open tokens: ['if', '(']
    1   # open tokens: ['if', '(']
  )     # open tokens: ['if']
end     # open tokens: []

each_line_result = RubyLex::NestingParser.parse_line(tokens)
line_tokens, open_tokens_before_line, open_tokens_after_line, minimum_token_depth_in_line = each_line_result[line_index]

# Example
puts((
    1 + 2
  ) + [ # HERE
    1
  ].size
)
# line_tokens: ['  ', ')', ' ', '+', ' ', '[', ' ', "# HERE\n"]
# open tokens changes: '((' → '(' (minimum) → '(['
# open_tokens_before_line: ['(', '(']
# open_tokens_after_line: ['(', '[']
# minimum_token_depth_in_line: 1

Prompt(check_corresponding_token_depth, check_string_literal), indent(process_indent_level, check_newline_depth_difference) and termination can be calculated from open tokens.

Update test

Add test for NestingParser
Delete some test which will be tested in newly added test
Fix test which was testing broken feature

Refactor

I've done a minimal refactoring in ruby-lex.rb

~~Reduce instance variables, Split readmultiline from each_top_level_statement~~ (moved to Simplify each_top_level_statement #576)
Refactor free indentation inside heredoc feature because it needs to be implemented in both process_indent_level and check_corresponding_token_depth

Bugs that cannot be fixed in this pull-request

Cannot indent these kind of code correctly. (endless def inside while condition)

if false
  while def f() = p do end and ()[] do end
end
if false
  while def f() = p do end && ()[] do end
  end
end

The only difference of Ripper.lex result is [[2, 27], :on_kw, "and", BEG] and [[2, 27], :on_op, "&&", BEG].

Other good things for the future

Testability

We can now test these functionalities separately.

calculate nesting for many ruby syntax
open tokens to indent conversion
open tokens to prompt conversion
actual indentation

Indent

We can now implement heredoc indent like the code below.
Previously, we cannot implement it because indent was using indent += 1 and indent -= 1.

if true
  if true
    s = <<HEREDOC
#{ # nesting level gets deeper but indent gets shallower
  (
    1
  )
}
heredoc
HEREDOC
    puts s # restores indent
  end
end

Parsing logic and indent calculation logic is separated, so we can now easily update indent of specific syntax.

# easy to implement this indent
words = %w[
  irb
  reline
]

Prompt

Prompt string is calculated from ltype, indent, continue and line_no.
We can now easily change it.

> string = "hello #{[
>   1, # ltype is currently `"` here because it is restricted to string-like literal
>   2, # changing ltype to `]` might be better
> ].join} world"

Completion

Currently, completion is implemented using regexp. array[index].??? shows Array methods and array.map{}.??? shows Hash | Proc methods.
We can get an S-expression or a syntax tree of incomplete code and use it for accurate completion.

closing_tokens_calculated_from_open_tokens = open_tokens.map { |t| closing_token_from(t) }
Ripper.sexp(incomplete_code + closing_tokens_calculated_from_open_tokens.join)

tompng · 2023-01-09T16:03:38Z

changed to draft because it got infinite loop in ruby2.7. trying to fixing it now

st0012

Thank you so much for the rewrite 🙏 I believe it’ll be a tremendous improvement to IRB.
But given the size of the change, we may need to go through this slowly with multiple reviews. I hope that’ll be ok.

I’ve given it a couple of scans and I have a question: Should we make context an instance variable?

From what I see, RubyLex can be used as an instance (e.g. scanner.set_input) or as a helper class (e.g. RubyLex.generate_local_variables_assign_code). And currently both usages would take context as an argument.

But for the former usages, the invocations could be largely simplified if context could be stored as an instance variable. So I did a quick search on RubyLex.new and I think it should be possible to do:

In ShowSource it could access the context through irb_context (provided by Nop)
In Irb.initialize it has context defined just a couple lines prior.
In RubyLex.set_input it also has access to context through itself. I wonder why we need to initialise another RubyLex instance here though.

If the answer is yes, I will to do that refactor before merging this PR, which is likely to cause some conflicts. So this is why I’m asking here 🙂

lib/irb/ruby-lex.rb

tompng · 2023-01-10T20:51:47Z

@st0012

Should we make context an instance variable?

Thanks for the explanation. I think it's a very good idea too. My answer is yes.

likely to cause some conflicts

I'll resolve it when it's needed 😄

st0012 · 2023-01-10T21:44:27Z

I ended up doing 3 types of refactor:

I hope once they're all merged, it'll make the rewrite even simpler 🙂

st0012 · 2023-01-14T09:26:46Z

@tompng All 3 refactor PRs have been merged 😄

tompng · 2023-01-14T11:06:34Z

Thanks, I rebased it

lib/irb/ruby-lex.rb

lib/irb/nesting_parser.rb

lib/irb/ruby-lex.rb

st0012

I like that we're getting more detailed coverage with this rewrite and I think some small tweaks can make the new tests even more approachable 👍

test/irb/test_nesting_parser.rb

…n check)

st0012

This is an amazing work, I think we're pretty close to merging it 👍

st0012 · 2023-06-13T11:19:34Z

test/irb/test_nesting_parser.rb

+      assert_equal(code.lines.size, line_results.size)
+      class_open, *inner_line_results, class_close = line_results
+      assert_equal(['class'], class_open[2].map(&:tok))
+      inner_line_results.each {|result| assert_equal(['class'], result[2].map(&:tok)) }


Just curious: why do we pick class as the target for checking Ruby syntax? Is the assumption: "if the class is not accidentally closed by any of the complicated syntax inside it, then we assume the nesting parser is working correctly"?

Yes. the intention of the test is to ensure "class is not accidentally closed".
We can also use if true or other nesting syntax instead of class A

I think class A is fine. But perhaps in later PRs we can try to match the inner lines' tokens too to increase the coverage.

test/irb/test_nesting_parser.rb

lib/irb/ruby-lex.rb

st0012

Let's merge it 🚀

add new features easily (ruby/irb#500) * Add nesting level parser for multiple use (indent, prompt, termination check) * Rewrite RubyLex using NestingParser * Add nesting parser tests, fix some existing tests * Add description comment, rename method to NestingParser * Add comments and tweak code to RubyLex * Update NestingParser test * Extract list of ltype tokens to constants

tompng marked this pull request as draft January 9, 2023 14:50

tompng force-pushed the rewrite_rubylex branch from f9983f9 to 9cfa1d3 Compare January 9, 2023 18:17

tompng marked this pull request as ready for review January 9, 2023 19:40

st0012 reviewed Jan 10, 2023

View reviewed changes

lib/irb/ruby-lex.rb Outdated Show resolved Hide resolved

tompng force-pushed the rewrite_rubylex branch from 9cfa1d3 to 45369cf Compare January 14, 2023 10:59

tompng mentioned this pull request Jan 14, 2023

multiline_repl do not need to depend on RubyLex ruby/reline#502

Merged

tompng mentioned this pull request Feb 5, 2023

Improve indentation: bugfix, heredoc, embdoc, strings #515

Merged

st0012 added the bug Something isn't working label Mar 1, 2023

tompng mentioned this pull request Apr 30, 2023

Simplify each_top_level_statement #576

Merged

tompng marked this pull request as draft May 19, 2023 14:37

tompng force-pushed the rewrite_rubylex branch 4 times, most recently from 3a92159 to b7b46ee Compare May 20, 2023 04:50

tompng marked this pull request as ready for review May 20, 2023 05:08

tompng mentioned this pull request Jun 2, 2023

IRB support ruby/prism#930

Closed

tompng commented Jun 10, 2023

View reviewed changes

lib/irb/ruby-lex.rb Show resolved Hide resolved

st0012 reviewed Jun 11, 2023

View reviewed changes

st0012 reviewed Jun 12, 2023

View reviewed changes

test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved

test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved

test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved

test/irb/test_nesting_parser.rb Outdated Show resolved Hide resolved

tompng added 4 commits June 12, 2023 23:21

Add nesting level parser for multiple use (indent, prompt, terminatio…

519d88f

…n check)

Rewrite RubyLex using NestingParser

1049d3f

Add nesting parser tests, fix some existing tests

0e501b0

Add description comment, rename method to NestingParser

60a71ed

tompng force-pushed the rewrite_rubylex branch 2 times, most recently from 16be166 to 6642387 Compare June 13, 2023 10:48

st0012 reviewed Jun 13, 2023

View reviewed changes

tompng mentioned this pull request Jun 13, 2023

Keep prev_spaces feature #605

Closed

tompng force-pushed the rewrite_rubylex branch from 6642387 to cccf5c3 Compare June 13, 2023 13:08

tompng added 3 commits June 13, 2023 23:01

Add comments and tweak code to RubyLex

f3ea2b5

Update NestingParser test

5f12659

Extract list of ltype tokens to constants

d1f1a7b

tompng force-pushed the rewrite_rubylex branch from cccf5c3 to d1f1a7b Compare June 13, 2023 14:01

st0012 approved these changes Jun 15, 2023

View reviewed changes

tompng merged commit 1b17101 into ruby:master Jun 15, 2023

tompng deleted the rewrite_rubylex branch June 15, 2023 15:40

This was referenced Jun 15, 2023

Keep prev spaces #607

Merged

Omit nesting_level, use indent_level to build prompt string #610

Merged

Fix process_continue and check_code_block #611

Merged

tompng mentioned this pull request Jun 28, 2023

Use Ripper.sexp instead of Regexp for completion #616

Closed

tompng mentioned this pull request Jul 14, 2023

Indent multiline percent literals #643

Merged

smmr0 mentioned this pull request Aug 20, 2023

Remove unused PROMPT_N #685

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite RubyLex to fix some bugs and make it possible to add new features easily #500

Rewrite RubyLex to fix some bugs and make it possible to add new features easily #500

tompng commented Jan 9, 2023 •

edited

Loading

tompng commented Jan 9, 2023

st0012 left a comment

tompng commented Jan 10, 2023

st0012 commented Jan 10, 2023

st0012 commented Jan 14, 2023

tompng commented Jan 14, 2023

st0012 left a comment

st0012 left a comment

st0012 Jun 13, 2023

tompng Jun 13, 2023

st0012 Jun 13, 2023

st0012 left a comment

Rewrite RubyLex to fix some bugs and make it possible to add new features easily #500

Rewrite RubyLex to fix some bugs and make it possible to add new features easily #500

Conversation

tompng commented Jan 9, 2023 • edited Loading

Description

What I changed

Rewrite nesting parser

Update test

Refactor

Bugs that cannot be fixed in this pull-request

Other good things for the future

Testability

Indent

Prompt

Completion

tompng commented Jan 9, 2023

st0012 left a comment

Choose a reason for hiding this comment

tompng commented Jan 10, 2023

st0012 commented Jan 10, 2023

st0012 commented Jan 14, 2023

tompng commented Jan 14, 2023

st0012 left a comment

Choose a reason for hiding this comment

st0012 left a comment

Choose a reason for hiding this comment

st0012 Jun 13, 2023

Choose a reason for hiding this comment

tompng Jun 13, 2023

Choose a reason for hiding this comment

st0012 Jun 13, 2023

Choose a reason for hiding this comment

st0012 left a comment

Choose a reason for hiding this comment

tompng commented Jan 9, 2023 •

edited

Loading