Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20278][R] Disable 'multiple_dots_linter' lint rule that is against project's code style #17590

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Apr 10, 2017

What changes were proposed in this pull request?

Currently, multi-dot separated variables in R is not allowed. For example,

 setMethod("from_json", signature(x = "Column", schema = "structType"),
-          function(x, schema, asJsonArray = FALSE, ...) {
+          function(x, schema, as.json.array = FALSE, ...) {
             if (asJsonArray) {
               jschema <- callJStatic("org.apache.spark.sql.types.DataTypes",
                                      "createArrayType",

produces an error as below:

R/functions.R:2462:31: style: Words within variable and function names should be separated by '_' rather than '.'.
          function(x, schema, as.json.array = FALSE, ...) {
                              ^~~~~~~~~~~~~

This seems against https://google.github.io/styleguide/Rguide.xml#identifiers which says

The preferred form for variable names is all lower case letters and words separated with dots

This looks because lintr by default https://github.com/jimhester/lintr follows http://r-pkgs.had.co.nz/style.html as written in the README.md. Few cases seems not following Google's one as "a few tweaks".

Per SPARK-6813, we follow Google's R Style Guide with few exceptions https://google.github.io/styleguide/Rguide.xml. This is also merged into Spark's website - apache/spark-website#43

Also, it looks we have no limit on function name. This rule also looks affecting to the name of functions as written in the README.md.

multiple_dots_linter: check that function and variable names are separated by _ rather than ..

How was this patch tested?

Manually tested ./dev/lint-rwith the manual change below in R/functions.R:

 setMethod("from_json", signature(x = "Column", schema = "structType"),
-          function(x, schema, asJsonArray = FALSE, ...) {
+          function(x, schema, as.json.array = FALSE, ...) {
             if (asJsonArray) {
               jschema <- callJStatic("org.apache.spark.sql.types.DataTypes",
                                      "createArrayType",

Before

R/functions.R:2462:31: style: Words within variable and function names should be separated by '_' rather than '.'.
          function(x, schema, as.json.array = FALSE, ...) {
                              ^~~~~~~~~~~~~

After

lintr checks passed.

@HyukjinKwon
Copy link
Member Author

cc @felixcheung, could you take a look and see if it makes sense to you?

@SparkQA
Copy link

SparkQA commented Apr 10, 2017

Test build #75656 has finished for PR 17590 at commit ed0dd86.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

felixcheung commented Apr 12, 2017 via email

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Apr 12, 2017

I think it is because it only checks multiple dots and the case I described above was finally found before (in my previous PR related with from_json function).

I think multiple dots are still valid per ...

The preferred form for variable names is all lower case letters and words separated with dots

Yea, maybe. I am fine with leaving this open for some days more to see if there is any objection. Probably, let me cc who I saw made many contributions to SparkR. cc @shivaram, @yanboliang, @wangmiao1981 and @actuaryzhang here. Please let me know if there is any concern.

@actuaryzhang
Copy link
Contributor

The change is fine to me. With this, we can define function/argument names using multiple styles such as as.json.array, as_json_array, asJsonArray. Is there a preferred style among these?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Apr 12, 2017

This is my thought on it.

  • variable names
    in SPARK-6813 ...

    We could have a R style guide based on the one from google ...

    in Google's R Style Guide

    variable.name is preferred, variableName is accepted
    GOOD: avg.clicks
    OK: avgClicks

    So, I guess this means ...

    • as.json.array : acceptable - contributors are fine to propose this and committers decide it.

    • asJsonArray : acceptable - contributors are fine to propose this and committers decide it.

    • as_json_array : not preferred

    although, I think we should not say avg.clicks is particularly better but defer to committers to decide it.

  • function names
    in SPARK-6813 ...

    2.no limit on function name (API should be similar as in other languages)

    As there is no limit on this, any form is allowed but we follow existing APIs forms in other languages at our best.

Note that, I think this does not necessarily mean we should sweep all existing avg_clicks form variables if they are such arguments in the current APIs as it breaks backwards compatibility and also I don't think it is worth introducing other alternative aliases just to follow the style.

in Google's R Style Guide

The coding conventions described above should be followed, unless there is good reason to do otherwise.

@shivaram
Copy link
Contributor

I think @HyukjinKwon interpretation is good !

@felixcheung
Copy link
Member

felixcheung commented Apr 16, 2017

Thanks @HyukjinKwon for addressing this and @actuaryzhang, @shivaram for chiming in.
I definitely think this is the right step to take and we could use more documentation in terms of specific style consistency we should have that is not enforced by lintr.

@felixcheung
Copy link
Member

merged to master.
@HyukjinKwon would you like to follow up and change asJsonArray in master?

@asfgit asfgit closed this in 86d251c Apr 16, 2017
@HyukjinKwon
Copy link
Member Author

Sure, thank you.

@HyukjinKwon HyukjinKwon deleted the disable-dot-in-name branch January 2, 2018 03:38
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…inst project's code style

## What changes were proposed in this pull request?

Currently, multi-dot separated variables in R is not allowed. For example,

```diff
 setMethod("from_json", signature(x = "Column", schema = "structType"),
-          function(x, schema, asJsonArray = FALSE, ...) {
+          function(x, schema, as.json.array = FALSE, ...) {
             if (asJsonArray) {
               jschema <- callJStatic("org.apache.spark.sql.types.DataTypes",
                                      "createArrayType",
```

produces an error as below:

```
R/functions.R:2462:31: style: Words within variable and function names should be separated by '_' rather than '.'.
          function(x, schema, as.json.array = FALSE, ...) {
                              ^~~~~~~~~~~~~
```

This seems against https://google.github.io/styleguide/Rguide.xml#identifiers which says

> The preferred form for variable names is all lower case letters and words separated with dots

This looks because lintr by default https://github.com/jimhester/lintr follows http://r-pkgs.had.co.nz/style.html as written in the README.md. Few cases seems not following Google's one as "a few tweaks".

Per [SPARK-6813](https://issues.apache.org/jira/browse/SPARK-6813), we follow Google's R Style Guide with few exceptions https://google.github.io/styleguide/Rguide.xml. This is also merged into Spark's website - apache/spark-website#43

Also, it looks we have no limit on function name. This rule also looks affecting to the name of functions as written in the README.md.

> `multiple_dots_linter`: check that function and variable names are separated by _ rather than ..

## How was this patch tested?

Manually tested `./dev/lint-r`with the manual change below in `R/functions.R`:

```diff
 setMethod("from_json", signature(x = "Column", schema = "structType"),
-          function(x, schema, asJsonArray = FALSE, ...) {
+          function(x, schema, as.json.array = FALSE, ...) {
             if (asJsonArray) {
               jschema <- callJStatic("org.apache.spark.sql.types.DataTypes",
                                      "createArrayType",
```

**Before**

```R
R/functions.R:2462:31: style: Words within variable and function names should be separated by '_' rather than '.'.
          function(x, schema, as.json.array = FALSE, ...) {
                              ^~~~~~~~~~~~~
```

**After**

```
lintr checks passed.
```

Author: hyukjinkwon <[email protected]>

Closes apache#17590 from HyukjinKwon/disable-dot-in-name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants