-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite RubyLex to fix some bugs and make it possible to add new features easily #500
Conversation
changed to draft because it got infinite loop in ruby2.7. trying to fixing it now |
f9983f9
to
9cfa1d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the rewrite 🙏 I believe it’ll be a tremendous improvement to IRB.
But given the size of the change, we may need to go through this slowly with multiple reviews. I hope that’ll be ok.
I’ve given it a couple of scans and I have a question: Should we make context
an instance variable?
From what I see, RubyLex
can be used as an instance (e.g. scanner.set_input
) or as a helper class (e.g. RubyLex.generate_local_variables_assign_code
). And currently both usages would take context
as an argument.
But for the former usages, the invocations could be largely simplified if context
could be stored as an instance variable. So I did a quick search on RubyLex.new
and I think it should be possible to do:
- In
ShowSource
it could access the context throughirb_context
(provided byNop
) - In
Irb.initialize
it has context defined just a couple lines prior. - In
RubyLex.set_input
it also has access to context through itself. I wonder why we need to initialise anotherRubyLex
instance here though.
If the answer is yes, I will to do that refactor before merging this PR, which is likely to cause some conflicts. So this is why I’m asking here 🙂
Thanks for the explanation. I think it's a very good idea too. My answer is yes.
I'll resolve it when it's needed 😄 |
I ended up doing 3 types of refactor:
I hope once they're all merged, it'll make the rewrite even simpler 🙂 |
@tompng All 3 refactor PRs have been merged 😄 |
9cfa1d3
to
45369cf
Compare
Thanks, I rebased it |
3a92159
to
b7b46ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we're getting more detailed coverage with this rewrite and I think some small tweaks can make the new tests even more approachable 👍
16be166
to
6642387
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an amazing work, I think we're pretty close to merging it 👍
assert_equal(code.lines.size, line_results.size) | ||
class_open, *inner_line_results, class_close = line_results | ||
assert_equal(['class'], class_open[2].map(&:tok)) | ||
inner_line_results.each {|result| assert_equal(['class'], result[2].map(&:tok)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: why do we pick class
as the target for checking Ruby syntax? Is the assumption: "if the class is not accidentally closed by any of the complicated syntax inside it, then we assume the nesting parser is working correctly"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. the intention of the test is to ensure "class is not accidentally closed".
We can also use if true
or other nesting syntax instead of class A
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think class A
is fine. But perhaps in later PRs we can try to match the inner lines' tokens too to increase the coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge it 🚀
add new features easily (ruby/irb#500) * Add nesting level parser for multiple use (indent, prompt, termination check) * Rewrite RubyLex using NestingParser * Add nesting parser tests, fix some existing tests * Add description comment, rename method to NestingParser * Add comments and tweak code to RubyLex * Update NestingParser test * Extract list of ltype tokens to constants
Description
Fixes #499
In RubyLex, there are several parser-like methods listed below. To fix bugs, almost all of these methods needs change.
I combined them all into a single parser that calculates open tokens for each line.
Duplicated codes are reduced.
What I changed
Rewrite nesting parser
Create
IRB::NestingParser
and use it from several methods.Prompt(check_corresponding_token_depth, check_string_literal), indent(process_indent_level, check_newline_depth_difference) and termination can be calculated from open tokens.
Update test
Refactor
I've done a minimal refactoring in ruby-lex.rb
Reduce instance variables, Split(moved to Simplify each_top_level_statement #576)readmultiline
fromeach_top_level_statement
process_indent_level
andcheck_corresponding_token_depth
Bugs that cannot be fixed in this pull-request
Cannot indent these kind of code correctly. (endless def inside while condition)
The only difference of Ripper.lex result is
[[2, 27], :on_kw, "and", BEG]
and[[2, 27], :on_op, "&&", BEG]
.Other good things for the future
Testability
We can now test these functionalities separately.
Indent
We can now implement heredoc indent like the code below.
Previously, we cannot implement it because indent was using
indent += 1
andindent -= 1
.Parsing logic and indent calculation logic is separated, so we can now easily update indent of specific syntax.
Prompt
Prompt string is calculated from
ltype
,indent
,continue
andline_no
.We can now easily change it.
Completion
Currently, completion is implemented using regexp.
array[index].???
showsArray
methods andarray.map{}.???
showsHash | Proc
methods.We can get an S-expression or a syntax tree of incomplete code and use it for accurate completion.