-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed coding style issues in Spark SQL #208
Conversation
All comments from @marmbrus were addressed, thanks for all the suggestions! Hope this can be a good foundation for future contributions :) |
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
Merged build triggered. |
Merged build started. |
@pwendell Merged most recent master, should be ready to be merged if Jenkins is happy. |
LGTM |
Merged build finished. |
All automated tests passed. |
Thanks, merged. |
To make it consistent with the rest of Spark, can you also submit a PR to update the "package" definitions to use fully qualified packages? |
They are fully qualified, as there is only only possible path that will define the package for a given file. For this reason I'm not sure if the logic behind avoiding relative imports applies here. There are just breaks placed in to implicitly include things from the package object of parent packages. I think this is a useful construct to include system-wide things like logging or type aliases without adding verbosity to every file in the project. |
package org.apache.spark.sql
package catalyst vs package org.apache.spark.sql.catalyst There are three reasons I think it'd be better to just have one package statement in a file.
Now this is a highly subjective topic, so we should get others to chime in. |
Hey I just want to make sure I understand. It seems like there are two options here:
and
And these accomplish the same goal. Is that correct? If the trade-off is to save ~15 characters vs having something more explicit, I definitely prefer just being explicit. At least in the past we've tried to err on the side of making things accessible. |
@pwendell When I started on this PR, I was also puzzled which option to use at first, until I saw the same usage in Scala standard library. But I should confess that I wasn't 100% sure about what the first option exactly mean until I played around both of them a bit. I think the most significant "advantage" of the first option is that we open parent packages implicitly. But since we forbid relative package imports, this is not exactly an advantage any more. So I vote for the second option now. |
I think a point here is whether we should assume all contributors use IDEs like IntelliJ and their automated features. At least, I find in most scenarios, default behaviours of IntelliJ match Spark coding convention well. Exceptions include indentation and false positive suggestions about adding/removing parenthesis to/from Java getter/setter methods. Maybe we can suggest developers (including ourselves) to rely on IDE more and even provide a default IntelliJ configuration that match Spark coding convention better? |
FWIW I'd vote for |
Also vote for |
Also +1 for |
Thanks for all your votes! I'll fix this in a separate PR ASAP. |
According to discussions in comments of PR #208, this PR unifies package definition format in Spark SQL. Some broken links in ScalaDoc and typos detected along the way are also fixed. Author: Cheng Lian <[email protected]> Closes #225 from liancheng/packageDefinition and squashes the following commits: 75c47b3 [Cheng Lian] Fixed file line length 4f87968 [Cheng Lian] Unified package definition format in Spark SQL
According to discussions in comments of PR apache#208, this PR unifies package definition format in Spark SQL. Some broken links in ScalaDoc and typos detected along the way are also fixed. Author: Cheng Lian <[email protected]> Closes apache#225 from liancheng/packageDefinition and squashes the following commits: 75c47b3 [Cheng Lian] Fixed file line length 4f87968 [Cheng Lian] Unified package definition format in Spark SQL
[SPARKR-188] Add profiling of R execution on worker side Conflicts: pkg/inst/worker/worker.R
This PR pulls in recent changes in SparkR-pkg, including cartesian, intersection, sampleByKey, subtract, subtractByKey, except, and some API for StructType and StructField. Author: cafreeman <[email protected]> Author: Davies Liu <[email protected]> Author: Zongheng Yang <[email protected]> Author: Shivaram Venkataraman <[email protected]> Author: Shivaram Venkataraman <[email protected]> Author: Sun Rui <[email protected]> Closes #5436 from davies/R3 and squashes the following commits: c2b09be [Davies Liu] SQLTypes -> schema a5a02f2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into R3 168b7fe [Davies Liu] sort generics b1fe460 [Davies Liu] fix conflict in README.md e74c04e [Davies Liu] fix schema.R 4f5ac09 [Davies Liu] Merge branch 'master' of github.com:apache/spark into R5 41f8184 [Davies Liu] rm man ae78312 [Davies Liu] Merge pull request #237 from sun-rui/SPARKR-154_3 1bdcb63 [Zongheng Yang] Updates to README.md. 5a553e7 [cafreeman] Use object attribute instead of argument 71372d9 [cafreeman] Update docs and examples 8526d2e [cafreeman] Remove `tojson` functions 6ef5f2d [cafreeman] Fix spacing 7741d66 [cafreeman] Rename the SQL DataType function 141efd8 [Shivaram Venkataraman] Merge pull request #245 from hqzizania/upstream 9387402 [Davies Liu] fix style 40199eb [Shivaram Venkataraman] Move except into sorted position 07d0dbc [Sun Rui] [SPARKR-244] Fix test failure after integration of subtract() and subtractByKey() for RDD. 7e8caa3 [Shivaram Venkataraman] Merge pull request #246 from hlin09/fixCombineByKey ed66c81 [cafreeman] Update `subtract` to work with `generics.R` f3ba785 [cafreeman] Fixed duplicate export 275deb4 [cafreeman] Update `NAMESPACE` and tests 1a3b63d [cafreeman] new version of `CreateDF` 836c4bf [cafreeman] Update `createDataFrame` and `toDF` be5d5c1 [cafreeman] refactor schema functions 40338a4 [Zongheng Yang] Merge pull request #244 from sun-rui/SPARKR-154_5 20b97a6 [Zongheng Yang] Merge pull request #234 from hqzizania/assist ba54e34 [Shivaram Venkataraman] Merge pull request #238 from sun-rui/SPARKR-154_4 c9497a3 [Shivaram Venkataraman] Merge pull request #208 from lythesia/master b317aa7 [Zongheng Yang] Merge pull request #243 from hqzizania/master 136a07e [Zongheng Yang] Merge pull request #242 from hqzizania/stats cd66603 [cafreeman] new line at EOF 8b76e81 [Shivaram Venkataraman] Merge pull request #233 from redbaron/fail-early-on-missing-dep 7dd81b7 [cafreeman] Documentation 0e2a94f [cafreeman] Define functions for schema and fields
Resync kube upstream and fix SBT build
This is to remove related jobs from theopenlab repo
This PR addresses various coding style issues in Spark SQL, including but not limited to those mentioned by @mateiz in PR #146.
As this PR affects lots of source files and may cause potential conflicts, it would be better to merge this as soon as possible after PR #205 (In-memory columnar representation for Spark SQL) is merged.