Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop syntax cop and replace it with separate RuboCop cops #207

Closed
bbatsov opened this issue May 28, 2013 · 36 comments
Closed

Drop syntax cop and replace it with separate RuboCop cops #207

bbatsov opened this issue May 28, 2013 · 36 comments
Assignees
Milestone

Comments

@bbatsov
Copy link
Collaborator

bbatsov commented May 28, 2013

The syntax cop has several major issues:

  • Running it is extremely slow - 3/4 of the time is spend spawing the external Ruby process, if we were spawning JRuby processes that would totally kill performance
  • All Ruby interpreters have different warning/error messages
  • Messages differ even between different versions of the same interpreter

I propose we change this like in the following manner:

  • Use Parser's built-in capabilities to detect syntax errors
  • Implement the most popular warnings from MRI as separate cops(unused variable, statement used in void context, assignment to constant, assignment in a condition, etc).

That shouldn't be too much work, but will make RuboCop run 3 times faster and truly portable.

Thoughts?

@jonas054
Copy link
Collaborator

Sounds really good. Too good to be true? I mean, I just hope you're right that it won't be too much work. But it would be really nice to get the same results on all platforms, and to improve performance.

@lee-dohm
Copy link

I feel like syntax and warnings are something conceptually separate from style checking. Would it make sense to package this as a Lint-style tool or library that RuboCop is intimately tied to? I think that cross-platform, consistent warnings would be useful on its own even separate from RuboCop.

@bbatsov
Copy link
Collaborator Author

bbatsov commented May 28, 2013

I'd never go for invoking an external process - that's just too damn slow. I don't like the idea of depending on another library as well, since this increases the maintenance burden for us(have to keep track of the library's development) and makes us dependent on upstream maintainers. Warnings can be qualified for a stylistic problem of the more serious kind :-)

I don't think that broadening the scope of RuboCop is a bad thing and I feel we're getting more and more qualified to write this kind of stuff :-) We could add some command-line option to RuboCop --lint-only, so people will be able to use it as only a linter if the want to. @jonas054 @lee-dohm How does this sound?

@jonas054
Copy link
Collaborator

Sounds fine.

@lee-dohm
Copy link

👍 Works for me ... just thinking out loud to make sure we think it out.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 4, 2013

@jonas054 Would you like to tackle the first part of this task - removing the Syntax cop and intercepting the parse errors from Parser instead. Afterwards we can split the cops for warnings.

@jonas054
Copy link
Collaborator

jonas054 commented Jun 4, 2013

Sure, I'll try that.

bbatsov pushed a commit that referenced this issue Jun 7, 2013
bbatsov pushed a commit that referenced this issue Jun 8, 2013
@yujinakayama
Copy link
Collaborator

To make a list of the MRI built-in warnings, I've examined MRI 2.0 source codes a bit. Though I have not checked in depth yet, I think this covers most part of the built-in warnings. These are just warnings displayed with ruby -w, not fatal errors.

$ git clone git://github.com/ruby/ruby.git
$ cd ruby
$ git checkout v2_0_0_195
$ egrep --line-number '(rb|parser)_warn' parse.y | egrep -v '# ?define|:parser_warn'
956:                rb_warn0("else without rescue is useless");
1146:               rb_warn0("END in method; use at_exit");
6479:    rb_warning0("ambiguous first argument; put parentheses or even spaces");
6821:    rb_warning0("`"op"' after local variable is interpreted as binary operator"), \
6822:    rb_warning0("even though it seems like "syn""))
6957:       rb_warning0("`**' interpreted as argument prefix");
6976:       rb_warning0("`*' interpreted as argument prefix");
7176:           rb_warnI("invalid character syntax; use ?\\%c", c2);
7239:       rb_warning0("`&' interpreted as argument prefix");
7560:           rb_warningS("Float %s out of range", tok());
8196:   parser_warning(h, "unused literal ignored");
8217:       parser_warning(tail, "statement not reached");
8628:       rb_warningS("shadowing outer local variable - %s", rb_id2name(name));
8790:   rb_warning0("empty expression");
8796:       parser_warning(node, "void value expression");
8930:   rb_warnS("possibly useless use of %s in void context", useless);
9066:   parser_warn(node->nd_value, "found = in conditional, should be ==");
9074:    if (!e_option_supplied(parser)) parser_warn(node, str);
9080:    if (!e_option_supplied(parser)) parser_warning(node, str);
9161:   rb_warn0("string literal in condition");
9185:       parser_warn(node, "range literal in condition");
9191:   parser_warning(node, "literal in condition");
9200:       parser_warning(node, "literal in condition");
9472:   rb_warn4S(ruby_sourcefile, (int)u[i], "assigned but unused variable - %s", rb_id2name(v[i]));
9756:        rb_warningS("named capture conflicts a local variable - %s",

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 18, 2013

Good work. Let's how we've done so far:

  • else without rescue is useless
  • END in method; use at_exit
  • ambiguous first argument; put parentheses or even spaces
  • `"op"' after local variable is interpreted as binary operator even though it seems like "syn"
  • `**' interpreted as argument prefix
  • `*' interpreted as argument prefix
  • invalid character syntax; use ?%c
  • `&' interpreted as argument prefix
  • Float %s out of range
  • unused literal ignored
  • statement not reached
  • shadowing outer local variable - %s
  • empty expression
  • void value expression
  • possibly useless use of %s in void context
  • found = in conditional, should be ==
  • string literal in condition
  • range literal in condition
  • literal in condition
  • literal in condition
  • assigned but unused variable - %s
  • named capture conflicts a local variable - %s

At this point it's most important to have "unused local variable" and "shadowing outer local variable" checks, so you might want to start with them.

@yujinakayama
Copy link
Collaborator

I'll try "shadowing outer local variable" next.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 25, 2013

I'll do END in method.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 26, 2013

I'll do literal in condition.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 26, 2013

@yujinakayama Seems we're almost done. I have no idea what the remaining 4 warnings stand for, since I've never seen them.

@jonas054
Copy link
Collaborator

You can get the "ambiguous first argument; put parentheses or even spaces" warning from method calls with a regexp literal argument and no parentheses, such as

func /a/

I think the "even spaces" remark means that in func / a the slash would be interpreted as division.

@yujinakayama
Copy link
Collaborator

@bbatsov Yes, I think we have covered major lint and probably can do the rest of warnings as minor improvements later.

Now I'm thinking about the covered warnings by Parser:

  • Add specs to confirm whether Parser really warn them since Parser may deprecate any of them in the future
  • Name the offences (the old Syntax or name each of them?)
  • Support disabling them with config or comments

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 26, 2013

@yujinakayama I've discussing with @whitequark that it might make sense for parser warnings to be removed from Parser and only lexer related warnings to remain. I guess we should add some way to disable them, although we might do that after 0.9 is launched.

Since there has been about a month since 0.8 I feel we should try to wrap 0.9 by the end of the week. We can polish the new features we've introduced so far in an extended series of 0.9.x releases over the next couple of weeks.

@yujinakayama
Copy link
Collaborator

@bbatsov OK. Actually I've been also a bit concerned about unreleased bug fixes.

@whitequark
Copy link

In fact parser only has the following warnings now:

    # Lexer warnings
    :invalid_escape_use      => 'invalid character syntax; use ?%{escape}',
    :ambiguous_literal       => 'ambiguous first argument; parenthesize arguments or add whitespace to the right',
    :ambiguous_prefix        => "`%{prefix}' interpreted as argument prefix",
    # Parser warnings
    :end_in_method           => 'END in method; use at_exit',
    :space_before_lparen     => "don't put space before argument parentheses",
    :useless_else            => 'else without rescue is useless',

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 26, 2013

As I mentioned in another comment earlier today, @whitequark, I think that end_in_method and useless_else don't belong in Parser and should be implemented in the tools that use Parser.

@whitequark
Copy link

I generally agree. What about others?

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 26, 2013

@yujinakayama Me too. Too many bugfixes are only in the master branch and because of the extensive changes we did for 0.9 they cannot be backported. You know what they say - "Release early, release often.".

@whitequark
Copy link

@bbatsov I've removed space_before_lparen (it was only in 1.8, where it was thought that this syntax will be removed; it weren't) and end_in_method. However I don't think that useless_else can be removed, because it does not go in the AST, right now.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 26, 2013

@whitequark Thanks! I think this will do for now. Once we get the new RuboCop out of the door we can revisit this.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jul 13, 2013

@yujinakayama Any progress on the remaining checks on the list?

@yujinakayama
Copy link
Collaborator

@bbatsov Not yet. I think I'll revisit this after the current refactoring.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Jun 12, 2014

@yujinakayama Should we keep this ticket open?

@yujinakayama
Copy link
Collaborator

Yes. :) Actually I have a long-running branch for this verification in my local environment. I'll finish it.

@alexdowad
Copy link
Contributor

Lint/AmbiguousOperator has the "ambiguous first argument" warning covered.

@alexdowad
Copy link
Contributor

Style/SpaceAroundOperators seems to warn in almost every circumstance where Ruby gives the "op after local variable is interpreted..." warning. Although the warning is a bit different from Ruby's, we are warning in those cases.

@alexdowad
Copy link
Contributor

When Ruby warns about "Float out of range", RuboCop warns that "Line is too long"... 😆

@alexdowad
Copy link
Contributor

The only way I can figure to get an "empty expression" warning out of Ruby, gives a Lint/ParenthesesAsGroupedExpression warning out of RC.

I think this issue can be closed.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Dec 19, 2015

When Ruby warns about "Float out of range", RuboCop warns that "Line is too long"...

Probably we should have a proper lint for this and we can close this issue.

@alexdowad
Copy link
Contributor

Probably we should have a proper lint for this and we can close this issue.

MRI just relies on libc's strtod to convert from string to float. If strtod returns ERANGE, then Ruby displays the "Float out of range" error.

So what does strtod look like? I'm just looking at glibc's implementation... it's more than 1000 lines of C. Sigh...

@alexdowad
Copy link
Contributor

It looks like glibc somehow computes the number of bits requires for the float's exponent and compares that to the number of bits available (which is platform-dependent). I haven't delved into the details more than that.

IEEE754's standard for double-precision floats gives 11 bits for the exponent and 52 bits for the fraction. A couple of bit patterns have special meanings, and the remaining ones can represent exponents of -1023 up to 1023 (I think). That's 2 ** -1023 up to 2 ** 1023. Multiply that by ((1 + fraction) * sign bit) to get the value of the number.

So any number with more than 1024 binary digits should overflow. If I'm not confused about my math, I think we can divide that by log2(10) to get the maximum number of decimal digits. It should be about 308.

@alexdowad
Copy link
Contributor

I've confirmed that 308-309 decimal digits is where overflow of Floats occurs.

@bbatsov
Copy link
Collaborator Author

bbatsov commented Dec 21, 2015

Guess this is good enough in terms of precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants