Drop syntax cop and replace it with separate RuboCop cops #207

bbatsov · 2013-05-28T15:18:50Z

The syntax cop has several major issues:

Running it is extremely slow - 3/4 of the time is spend spawing the external Ruby process, if we were spawning JRuby processes that would totally kill performance
All Ruby interpreters have different warning/error messages
Messages differ even between different versions of the same interpreter

I propose we change this like in the following manner:

Use Parser's built-in capabilities to detect syntax errors
Implement the most popular warnings from MRI as separate cops(unused variable, statement used in void context, assignment to constant, assignment in a condition, etc).

That shouldn't be too much work, but will make RuboCop run 3 times faster and truly portable.

Thoughts?

jonas054 · 2013-05-28T16:05:39Z

Sounds really good. Too good to be true? I mean, I just hope you're right that it won't be too much work. But it would be really nice to get the same results on all platforms, and to improve performance.

lee-dohm · 2013-05-28T16:06:01Z

I feel like syntax and warnings are something conceptually separate from style checking. Would it make sense to package this as a Lint-style tool or library that RuboCop is intimately tied to? I think that cross-platform, consistent warnings would be useful on its own even separate from RuboCop.

bbatsov · 2013-05-28T16:14:44Z

I'd never go for invoking an external process - that's just too damn slow. I don't like the idea of depending on another library as well, since this increases the maintenance burden for us(have to keep track of the library's development) and makes us dependent on upstream maintainers. Warnings can be qualified for a stylistic problem of the more serious kind :-)

I don't think that broadening the scope of RuboCop is a bad thing and I feel we're getting more and more qualified to write this kind of stuff :-) We could add some command-line option to RuboCop --lint-only, so people will be able to use it as only a linter if the want to. @jonas054 @lee-dohm How does this sound?

jonas054 · 2013-05-28T17:52:31Z

Sounds fine.

lee-dohm · 2013-05-28T18:56:48Z

👍 Works for me ... just thinking out loud to make sure we think it out.

bbatsov · 2013-06-04T14:59:48Z

@jonas054 Would you like to tackle the first part of this task - removing the Syntax cop and intercepting the parse errors from Parser instead. Afterwards we can split the cops for warnings.

jonas054 · 2013-06-04T19:17:21Z

Sure, I'll try that.

yujinakayama · 2013-06-17T06:57:19Z

To make a list of the MRI built-in warnings, I've examined MRI 2.0 source codes a bit. Though I have not checked in depth yet, I think this covers most part of the built-in warnings. These are just warnings displayed with ruby -w, not fatal errors.

$ git clone git://github.com/ruby/ruby.git
$ cd ruby
$ git checkout v2_0_0_195
$ egrep --line-number '(rb|parser)_warn' parse.y | egrep -v '# ?define|:parser_warn'
956:                rb_warn0("else without rescue is useless");
1146:               rb_warn0("END in method; use at_exit");
6479:    rb_warning0("ambiguous first argument; put parentheses or even spaces");
6821:    rb_warning0("`"op"' after local variable is interpreted as binary operator"), \
6822:    rb_warning0("even though it seems like "syn""))
6957:       rb_warning0("`**' interpreted as argument prefix");
6976:       rb_warning0("`*' interpreted as argument prefix");
7176:           rb_warnI("invalid character syntax; use ?\\%c", c2);
7239:       rb_warning0("`&' interpreted as argument prefix");
7560:           rb_warningS("Float %s out of range", tok());
8196:   parser_warning(h, "unused literal ignored");
8217:       parser_warning(tail, "statement not reached");
8628:       rb_warningS("shadowing outer local variable - %s", rb_id2name(name));
8790:   rb_warning0("empty expression");
8796:       parser_warning(node, "void value expression");
8930:   rb_warnS("possibly useless use of %s in void context", useless);
9066:   parser_warn(node->nd_value, "found = in conditional, should be ==");
9074:    if (!e_option_supplied(parser)) parser_warn(node, str);
9080:    if (!e_option_supplied(parser)) parser_warning(node, str);
9161:   rb_warn0("string literal in condition");
9185:       parser_warn(node, "range literal in condition");
9191:   parser_warning(node, "literal in condition");
9200:       parser_warning(node, "literal in condition");
9472:   rb_warn4S(ruby_sourcefile, (int)u[i], "assigned but unused variable - %s", rb_id2name(v[i]));
9756:        rb_warningS("named capture conflicts a local variable - %s",

bbatsov · 2013-06-18T13:02:24Z

yujinakayama · 2013-06-22T16:34:00Z

I'll try "shadowing outer local variable" next.

bbatsov · 2013-06-25T15:39:38Z

I'll do END in method.

bbatsov · 2013-06-26T08:56:03Z

I'll do literal in condition.

bbatsov · 2013-06-26T09:41:18Z

@yujinakayama Seems we're almost done. I have no idea what the remaining 4 warnings stand for, since I've never seen them.

jonas054 · 2013-06-26T11:28:04Z

You can get the "ambiguous first argument; put parentheses or even spaces" warning from method calls with a regexp literal argument and no parentheses, such as

func /a/

I think the "even spaces" remark means that in func / a the slash would be interpreted as division.

yujinakayama · 2013-06-26T15:06:02Z

@bbatsov Yes, I think we have covered major lint and probably can do the rest of warnings as minor improvements later.

Now I'm thinking about the covered warnings by Parser:

Add specs to confirm whether Parser really warn them since Parser may deprecate any of them in the future
Name the offences (the old Syntax or name each of them?)
Support disabling them with config or comments

bbatsov · 2013-06-26T15:13:23Z

@yujinakayama I've discussing with @whitequark that it might make sense for parser warnings to be removed from Parser and only lexer related warnings to remain. I guess we should add some way to disable them, although we might do that after 0.9 is launched.

Since there has been about a month since 0.8 I feel we should try to wrap 0.9 by the end of the week. We can polish the new features we've introduced so far in an extended series of 0.9.x releases over the next couple of weeks.

yujinakayama · 2013-06-26T15:21:43Z

@bbatsov OK. Actually I've been also a bit concerned about unreleased bug fixes.

whitequark · 2013-06-26T15:31:26Z

In fact parser only has the following warnings now:

    # Lexer warnings
    :invalid_escape_use      => 'invalid character syntax; use ?%{escape}',
    :ambiguous_literal       => 'ambiguous first argument; parenthesize arguments or add whitespace to the right',
    :ambiguous_prefix        => "`%{prefix}' interpreted as argument prefix",
    # Parser warnings
    :end_in_method           => 'END in method; use at_exit',
    :space_before_lparen     => "don't put space before argument parentheses",
    :useless_else            => 'else without rescue is useless',

bbatsov · 2013-06-26T15:34:04Z

As I mentioned in another comment earlier today, @whitequark, I think that end_in_method and useless_else don't belong in Parser and should be implemented in the tools that use Parser.

whitequark · 2013-06-26T15:35:22Z

I generally agree. What about others?

bbatsov · 2013-06-26T15:36:07Z

@yujinakayama Me too. Too many bugfixes are only in the master branch and because of the extensive changes we did for 0.9 they cannot be backported. You know what they say - "Release early, release often.".

whitequark · 2013-06-26T15:39:59Z

@bbatsov I've removed space_before_lparen (it was only in 1.8, where it was thought that this syntax will be removed; it weren't) and end_in_method. However I don't think that useless_else can be removed, because it does not go in the AST, right now.

bbatsov · 2013-06-26T15:41:55Z

@whitequark Thanks! I think this will do for now. Once we get the new RuboCop out of the door we can revisit this.

bbatsov · 2013-07-13T06:32:05Z

@yujinakayama Any progress on the remaining checks on the list?

yujinakayama · 2013-07-13T06:38:10Z

@bbatsov Not yet. I think I'll revisit this after the current refactoring.

bbatsov · 2014-06-12T08:57:54Z

@yujinakayama Should we keep this ticket open?

yujinakayama · 2014-06-12T09:08:16Z

Yes. :) Actually I have a long-running branch for this verification in my local environment. I'll finish it.

alexdowad · 2015-12-16T11:18:28Z

Lint/AmbiguousOperator has the "ambiguous first argument" warning covered.

alexdowad · 2015-12-16T11:26:12Z

Style/SpaceAroundOperators seems to warn in almost every circumstance where Ruby gives the "op after local variable is interpreted..." warning. Although the warning is a bit different from Ruby's, we are warning in those cases.

alexdowad · 2015-12-16T11:28:47Z

When Ruby warns about "Float out of range", RuboCop warns that "Line is too long"... 😆

alexdowad · 2015-12-16T11:33:42Z

The only way I can figure to get an "empty expression" warning out of Ruby, gives a Lint/ParenthesesAsGroupedExpression warning out of RC.

I think this issue can be closed.

bbatsov · 2015-12-19T09:57:46Z

When Ruby warns about "Float out of range", RuboCop warns that "Line is too long"...

Probably we should have a proper lint for this and we can close this issue.

alexdowad · 2015-12-21T10:06:48Z

Probably we should have a proper lint for this and we can close this issue.

MRI just relies on libc's strtod to convert from string to float. If strtod returns ERANGE, then Ruby displays the "Float out of range" error.

So what does strtod look like? I'm just looking at glibc's implementation... it's more than 1000 lines of C. Sigh...

alexdowad · 2015-12-21T11:08:42Z

It looks like glibc somehow computes the number of bits requires for the float's exponent and compares that to the number of bits available (which is platform-dependent). I haven't delved into the details more than that.

IEEE754's standard for double-precision floats gives 11 bits for the exponent and 52 bits for the fraction. A couple of bit patterns have special meanings, and the remaining ones can represent exponents of -1023 up to 1023 (I think). That's 2 ** -1023 up to 2 ** 1023. Multiply that by ((1 + fraction) * sign bit) to get the value of the number.

So any number with more than 1024 binary digits should overflow. If I'm not confused about my math, I think we can divide that by log2(10) to get the maximum number of decimal digits. It should be about 308.

alexdowad · 2015-12-21T12:22:18Z

I've confirmed that 308-309 decimal digits is where overflow of Floats occurs.

bbatsov · 2015-12-21T12:28:29Z

Guess this is good enough in terms of precision.

jonas054 mentioned this issue Jun 5, 2013

Syntax check by parser #237

Merged

bbatsov pushed a commit that referenced this issue Jun 7, 2013

New cop AssignmentInCondition (#207)

033fe88

bbatsov pushed a commit that referenced this issue Jun 8, 2013

New cop VoidOp(#207)

853f5e4

ghost assigned yujinakayama Aug 8, 2013

yujinakayama mentioned this issue Sep 26, 2013

Support disabling Syntax offences with warning severity #510

Merged

bbatsov closed this as completed in 36a4723 Dec 21, 2015

Drop syntax cop and replace it with separate RuboCop cops #207

Drop syntax cop and replace it with separate RuboCop cops #207

Comments

bbatsov commented May 28, 2013

jonas054 commented May 28, 2013

lee-dohm commented May 28, 2013

bbatsov commented May 28, 2013

jonas054 commented May 28, 2013

lee-dohm commented May 28, 2013

bbatsov commented Jun 4, 2013

jonas054 commented Jun 4, 2013

yujinakayama commented Jun 17, 2013

bbatsov commented Jun 18, 2013

yujinakayama commented Jun 22, 2013

bbatsov commented Jun 25, 2013

bbatsov commented Jun 26, 2013

bbatsov commented Jun 26, 2013

jonas054 commented Jun 26, 2013

yujinakayama commented Jun 26, 2013

bbatsov commented Jun 26, 2013

yujinakayama commented Jun 26, 2013

whitequark commented Jun 26, 2013

bbatsov commented Jun 26, 2013

whitequark commented Jun 26, 2013

bbatsov commented Jun 26, 2013

whitequark commented Jun 26, 2013

bbatsov commented Jun 26, 2013

bbatsov commented Jul 13, 2013

yujinakayama commented Jul 13, 2013

bbatsov commented Jun 12, 2014

yujinakayama commented Jun 12, 2014

alexdowad commented Dec 16, 2015

alexdowad commented Dec 16, 2015

alexdowad commented Dec 16, 2015

alexdowad commented Dec 16, 2015

bbatsov commented Dec 19, 2015

alexdowad commented Dec 21, 2015

alexdowad commented Dec 21, 2015

alexdowad commented Dec 21, 2015

bbatsov commented Dec 21, 2015