Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed coding style issues in Spark SQL #208

Closed
wants to merge 6 commits into from

Conversation

liancheng
Copy link
Contributor

This PR addresses various coding style issues in Spark SQL, including but not limited to those mentioned by @mateiz in PR #146.

As this PR affects lots of source files and may cause potential conflicts, it would be better to merge this as soon as possible after PR #205 (In-memory columnar representation for Spark SQL) is merged.

@liancheng
Copy link
Contributor Author

All comments from @marmbrus were addressed, thanks for all the suggestions! Hope this can be a good foundation for future contributions :)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13364/

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@liancheng
Copy link
Contributor Author

@pwendell Merged most recent master, should be ready to be merged if Jenkins is happy.

@marmbrus
Copy link
Contributor

LGTM

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13372/

@pwendell
Copy link
Contributor

Thanks, merged.

@asfgit asfgit closed this in 8265dc7 Mar 23, 2014
@rxin
Copy link
Contributor

rxin commented Mar 24, 2014

To make it consistent with the rest of Spark, can you also submit a PR to update the "package" definitions to use fully qualified packages?

@marmbrus
Copy link
Contributor

They are fully qualified, as there is only only possible path that will define the package for a given file. For this reason I'm not sure if the logic behind avoiding relative imports applies here.

There are just breaks placed in to implicitly include things from the package object of parent packages. I think this is a useful construct to include system-wide things like logging or type aliases without adding verbosity to every file in the project.

@rxin
Copy link
Contributor

rxin commented Mar 24, 2014

package org.apache.spark.sql
package catalyst

vs

package org.apache.spark.sql.catalyst

There are three reasons I think it'd be better to just have one package statement in a file.

  1. It is easier to understand by most programmers, especially the ones that come from Java land (I was chatting with another committer just now and he got confused by the meaning of having two or three package statements in a Scala file).
  2. It requires explicit import to open up the parent package scope and avoids polluting the namespace (there is no difference in terms of line count here since you add one import but remove one package)
  3. It is more consistent with the rest of Spark code base.

Now this is a highly subjective topic, so we should get others to chime in.

@pwendell
Copy link
Contributor

Hey I just want to make sure I understand. It seems like there are two options here:

package org.apache.spark.sql
package catalyst

and

package org.apache.spark.sql.catalyst

import org.apache.spark.sql._

And these accomplish the same goal. Is that correct?

If the trade-off is to save ~15 characters vs having something more explicit, I definitely prefer just being explicit. At least in the past we've tried to err on the side of making things accessible.

@liancheng
Copy link
Contributor Author

@pwendell When I started on this PR, I was also puzzled which option to use at first, until I saw the same usage in Scala standard library. But I should confess that I wasn't 100% sure about what the first option exactly mean until I played around both of them a bit.

I think the most significant "advantage" of the first option is that we open parent packages implicitly. But since we forbid relative package imports, this is not exactly an advantage any more. So I vote for the second option now.

@liancheng
Copy link
Contributor Author

I think a point here is whether we should assume all contributors use IDEs like IntelliJ and their automated features. At least, I find in most scenarios, default behaviours of IntelliJ match Spark coding convention well. Exceptions include indentation and false positive suggestions about adding/removing parenthesis to/from Java getter/setter methods.

Maybe we can suggest developers (including ourselves) to rely on IDE more and even provide a default IntelliJ configuration that match Spark coding convention better?

@shivaram
Copy link
Contributor

FWIW I'd vote for package org.apache.spark.sql.catalyst and not assuming that everybody uses an IDE

@aarondav
Copy link
Contributor

Also vote for package org.apache.spark.sql.catalyst, for new developers and to keep consistent with the rest of the Spark codebase.
edit: prematurely submitted

@alig
Copy link
Contributor

alig commented Mar 24, 2014

Also +1 for package org.apache.spark.sql.catalyst, just because it's simpler to understand for the majority of the programmers in the world ;)

@liancheng
Copy link
Contributor Author

Thanks for all your votes! I'll fix this in a separate PR ASAP.

asfgit pushed a commit that referenced this pull request Mar 26, 2014
According to discussions in comments of PR #208, this PR unifies package definition format in Spark SQL.

Some broken links in ScalaDoc and typos detected along the way are also fixed.

Author: Cheng Lian <[email protected]>

Closes #225 from liancheng/packageDefinition and squashes the following commits:

75c47b3 [Cheng Lian] Fixed file line length
4f87968 [Cheng Lian] Unified package definition format in Spark SQL
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
According to discussions in comments of PR apache#208, this PR unifies package definition format in Spark SQL.

Some broken links in ScalaDoc and typos detected along the way are also fixed.

Author: Cheng Lian <[email protected]>

Closes apache#225 from liancheng/packageDefinition and squashes the following commits:

75c47b3 [Cheng Lian] Fixed file line length
4f87968 [Cheng Lian] Unified package definition format in Spark SQL
@liancheng liancheng deleted the fixCodingStyle branch September 24, 2014 00:15
davies pushed a commit to davies/spark that referenced this pull request Apr 14, 2015
[SPARKR-188] Add profiling of R execution on worker side
Conflicts:
	pkg/inst/worker/worker.R
asfgit pushed a commit that referenced this pull request Apr 17, 2015
This PR pulls in recent changes in SparkR-pkg, including

cartesian, intersection, sampleByKey, subtract, subtractByKey, except, and some API for StructType and StructField.

Author: cafreeman <[email protected]>
Author: Davies Liu <[email protected]>
Author: Zongheng Yang <[email protected]>
Author: Shivaram Venkataraman <[email protected]>
Author: Shivaram Venkataraman <[email protected]>
Author: Sun Rui <[email protected]>

Closes #5436 from davies/R3 and squashes the following commits:

c2b09be [Davies Liu] SQLTypes -> schema
a5a02f2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into R3
168b7fe [Davies Liu] sort generics
b1fe460 [Davies Liu] fix conflict in README.md
e74c04e [Davies Liu] fix schema.R
4f5ac09 [Davies Liu] Merge branch 'master' of github.com:apache/spark into R5
41f8184 [Davies Liu] rm man
ae78312 [Davies Liu] Merge pull request #237 from sun-rui/SPARKR-154_3
1bdcb63 [Zongheng Yang] Updates to README.md.
5a553e7 [cafreeman] Use object attribute instead of argument
71372d9 [cafreeman] Update docs and examples
8526d2e [cafreeman] Remove `tojson` functions
6ef5f2d [cafreeman] Fix spacing
7741d66 [cafreeman] Rename the SQL DataType function
141efd8 [Shivaram Venkataraman] Merge pull request #245 from hqzizania/upstream
9387402 [Davies Liu] fix style
40199eb [Shivaram Venkataraman] Move except into sorted position
07d0dbc [Sun Rui] [SPARKR-244] Fix test failure after integration of subtract() and subtractByKey() for RDD.
7e8caa3 [Shivaram Venkataraman] Merge pull request #246 from hlin09/fixCombineByKey
ed66c81 [cafreeman] Update `subtract` to work with `generics.R`
f3ba785 [cafreeman] Fixed duplicate export
275deb4 [cafreeman] Update `NAMESPACE` and tests
1a3b63d [cafreeman] new version of `CreateDF`
836c4bf [cafreeman] Update `createDataFrame` and `toDF`
be5d5c1 [cafreeman] refactor schema functions
40338a4 [Zongheng Yang] Merge pull request #244 from sun-rui/SPARKR-154_5
20b97a6 [Zongheng Yang] Merge pull request #234 from hqzizania/assist
ba54e34 [Shivaram Venkataraman] Merge pull request #238 from sun-rui/SPARKR-154_4
c9497a3 [Shivaram Venkataraman] Merge pull request #208 from lythesia/master
b317aa7 [Zongheng Yang] Merge pull request #243 from hqzizania/master
136a07e [Zongheng Yang] Merge pull request #242 from hqzizania/stats
cd66603 [cafreeman] new line at EOF
8b76e81 [Shivaram Venkataraman] Merge pull request #233 from redbaron/fail-early-on-missing-dep
7dd81b7 [cafreeman] Documentation
0e2a94f [cafreeman] Define functions for schema and fields
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 12, 2017
Resync kube upstream and fix SBT build
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
This is to remove related jobs from theopenlab repo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants