[DOCS] Docs-only improvements #17417

jaceklaskowski · 2017-03-24T21:46:14Z

…adoc

What changes were proposed in this pull request?

Use recommended values for row boundaries in Window's scaladoc, i.e. Window.unboundedPreceding, Window.unboundedFollowing, and Window.currentRow (that were introduced in 2.1.0).

How was this patch tested?

Local build

SparkQA · 2017-03-24T23:49:27Z

Test build #75180 has finished for PR 17417 at commit efc420b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-25T00:03:38Z

Test build #75183 has finished for PR 17417 at commit 07a36f8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

I'm not sure any of these changes are necessary?

srowen · 2017-03-25T08:55:49Z

sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala

@@ -22,7 +22,7 @@ import org.apache.spark.sql.Column
 import org.apache.spark.sql.catalyst.expressions._

 /**
- * Utility functions for defining window in DataFrames.
+ * Utility functions for defining window in Datasets.


These are used with DataFrames, right? At least that's what I have used Window for, am I missing something?

Sure. I'm going to revert the changes. There's a little value in them. I'd rather see the changes approved in general than fight for DataFrame vs Dataset.

p.s. There's no DataFrame in Spark SQL which is just a type alias of Dataset[Row] -- see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/package.scala#L46.

It's an alias, yes, but it certainly exists as a user-facing type.

srowen · 2017-03-25T08:56:33Z

sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala

@@ -113,12 +113,12 @@ object Window {
   * Creates a [[WindowSpec]] with the frame boundaries defined,
   * from `start` (inclusive) to `end` (inclusive).
   *
-   * Both `start` and `end` are relative positions from the current row. For example, "0" means
+   * Both `start` and `end` are relative positions to the current row. For example, "0" means


I think the right phrasing is: "are positions relative to the current row". The current text is OK IMHO

I'll fix it to be more accurate (that's the purpose of this particular change so the more accurate the merrier). Thanks!

srowen · 2017-03-25T08:57:23Z

sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala

   * "current row", while "-1" means the row before the current row, and "5" means the fifth row
   * after the current row.
   *
-   * We recommend users use `Window.unboundedPreceding`, `Window.unboundedFollowing`,
-   * and `Window.currentRow` to specify special boundary values, rather than using integral
+   * We recommend users to use [[Window.unboundedPreceding]], [[Window.unboundedFollowing]],


"We recommend that users use" is correct, but 'that' can be omitted and it's still correct.
I think the backticks are on purpose as many scaladoc refs like this also cause doc failures. At least you need to verify this before changing.

Leaving 'that' aside is incorrect -- see http://dictionary.cambridge.org/dictionary/english/recommend where to is even highlighted to make the point.

srowen · 2017-03-25T08:58:25Z

sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala

-   *       sum('id) over Window.partitionBy('category).orderBy('id).rowsBetween(0,1))
-   *     .show()
+   *   val byCategoryOrderedById =
+   *     Window.partitionBy('category).orderBy('id).rowsBetween(Window.currentRow, 1)


Why this change? 0 should mean current row.

See the above change where Spark devs "recommend that users use" the values by their aliases not their numeric values.

Got it, and I think there are also doc examples like this in Column.scala and WindowSpec.scala that could be similarly improved

SparkQA · 2017-03-25T15:03:35Z

Test build #75217 has finished for PR 17417 at commit 07001a9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-03-26T09:40:22Z

sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala

   * while "-1" means one off before the current row, and "5" means the five off after the
   * current row.
   *
-   * We recommend users use `Window.unboundedPreceding`, `Window.unboundedFollowing`,
+   * We recommend that users use `Window.unboundedPreceding`, `Window.unboundedFollowing`,


Either way is correct, it wasn't wrong

srowen · 2017-03-26T09:40:40Z

sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala

@@ -200,9 +200,9 @@ object Window {
   * }}}
   *
   * @param start boundary start, inclusive. The frame is unbounded if this is
-   *              the minimum long value (`Window.unboundedPreceding`).
+   *              the minimum long value, i.e. `Window.unboundedPreceding`.


Likewise this is effectively identical. I wouldn't make changes like this

SparkQA · 2017-03-26T17:47:46Z

Test build #75240 has finished for PR 17417 at commit bf82dc6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jaceklaskowski · 2017-03-27T16:23:45Z

Hey @srowen Would appreciate your looking at the changes again and comments (or merge). Thanks!

srowen · 2017-03-27T17:32:22Z

How about the other files I mentioned? I think they can take similar changes. I think you can roll your other PR into this. They're both kinda misc doc improvements.

jaceklaskowski · 2017-03-28T14:06:27Z

I'm going to merge the two PRs with your comments applied (i.e. excluding changes that are not necessarily doc-only). Thanks a lot for your time, Sean. Appreciate a lot.

SparkQA · 2017-03-29T12:42:43Z

Test build #75352 has finished for PR 17417 at commit 0c4a77e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait Source

SparkQA · 2017-03-29T12:43:39Z

Test build #75353 has finished for PR 17417 at commit 8dc1f04.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jaceklaskowski · 2017-03-29T12:45:42Z

Executed cd docs && SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve to check the changes and they've seemed fine. I had to fix some extra javadoc-related places to please jekyll.

@srowen Ready to review the changes once more? Thanks.

SparkQA · 2017-03-29T13:13:45Z

Test build #75354 has finished for PR 17417 at commit e09802d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-03-29T14:50:36Z

Test build #75355 has finished for PR 17417 at commit db426e3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Most of this looks fine to me, but I have a few questions still about the changes.

srowen · 2017-03-29T19:10:18Z

core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java

@@ -52,16 +52,15 @@
 * This class implements sort-based shuffle's hash-style shuffle fallback path. This write path
 * writes incoming records to separate files, one file per reduce partition, then concatenates these
 * per-partition files to form a single output file, regions of which are served to reducers.
- * Records are not buffered in memory. This is essentially identical to
- * {@link org.apache.spark.shuffle.hash.HashShuffleWriter}, except that it writes output in a format
+ * Records are not buffered in memory. It writes output in a format


Why remove this particular comment?

HashShuffleWriter is long gone.

srowen · 2017-03-29T20:09:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala

@@ -75,7 +75,6 @@ case class WindowSpecDefinition(
      frameSpecification.isInstanceOf[SpecifiedWindowFrame]

  override def nullable: Boolean = true
-  override def foldable: Boolean = false


I get that this is redundant, or happens to be right now, but I don't think I'd remove it in a docs-only change

Correct. Reverting...

srowen · 2017-03-29T20:10:12Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala

@@ -26,7 +26,8 @@ import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.CalendarInterval

 /**
- * Test basic expression parsing. If a type of expression is supported it should be tested here.
+ * Test basic expression parsing.
+ * If the type of an expression is supported it should be tested here.


This is a no-op change, I'd avoid this.

Almost. I replaced a with the and added an before expression.

srowen · 2017-03-29T20:10:47Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

@@ -60,7 +60,7 @@ import org.apache.spark.util.Utils
 * The builder can also be used to create a new session:
 *
 * {{{
- *   SparkSession.builder()
+ *   SparkSession.builder


Is this for consistency? it also seems not worth changing otherwise

Consistency (and one of the recommended coding styles of mine).

SparkQA · 2017-03-29T23:28:42Z

Test build #75368 has finished for PR 17417 at commit 913dbb8.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

srowen · 2017-03-30T09:52:43Z

Looks good, just needs a rebase now

…adoc Thanks Sean for review!

SparkQA · 2017-03-30T14:21:19Z

Test build #75388 has finished for PR 17417 at commit ae57b33.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-03-30T15:07:41Z

Merged to master

srowen reviewed Mar 25, 2017

View reviewed changes

srowen reviewed Mar 26, 2017

View reviewed changes

jaceklaskowski force-pushed the window-expression-scaladoc branch from bf82dc6 to 802c150 Compare March 29, 2017 09:53

jaceklaskowski changed the title ~~[SQL][DOC] Use recommended values for row boundaries in Window's scal…~~ [DOC] Doc-only improvements Mar 29, 2017

jaceklaskowski changed the title ~~[DOC] Doc-only improvements~~ [DOCS] Doc-only improvements Mar 29, 2017

jaceklaskowski changed the title ~~[DOCS] Doc-only improvements~~ [DOCS] Docs-only improvements Mar 29, 2017

jaceklaskowski mentioned this pull request Mar 29, 2017

[MINOR] Typo fixes #17434

Closed

srowen requested changes Mar 29, 2017

View reviewed changes

jaceklaskowski force-pushed the window-expression-scaladoc branch from f21f434 to 913dbb8 Compare March 29, 2017 20:26

srowen approved these changes Mar 30, 2017

View reviewed changes

jaceklaskowski added 6 commits March 30, 2017 13:23

[SQL][DOC] Use recommended values for row boundaries in Window's scal…

0dd905f

…adoc Thanks Sean for review!

[SQL][DOC][MINOR] Squashing a typo

94f5528

[MINOR] Typo fixes

243b8ee

Another typo

e26aa57

Another finding

49149c6

Revert change after review

caac4be

jaceklaskowski added 3 commits March 30, 2017 13:23

After code review

d245c03

Changes to please unidoc (spark/javaunidoc:doc)

d301b5a

Revert a change after review

ae57b33

jaceklaskowski force-pushed the window-expression-scaladoc branch from 913dbb8 to ae57b33 Compare March 30, 2017 11:26

asfgit closed this in 0197262 Mar 30, 2017

jaceklaskowski deleted the window-expression-scaladoc branch March 30, 2017 15:21

[DOCS] Docs-only improvements #17417

[DOCS] Docs-only improvements #17417

Conversation

jaceklaskowski commented Mar 24, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Mar 24, 2017

SparkQA commented Mar 25, 2017

srowen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 26, 2017

jaceklaskowski commented Mar 27, 2017

srowen commented Mar 27, 2017

jaceklaskowski commented Mar 28, 2017

SparkQA commented Mar 29, 2017

SparkQA commented Mar 29, 2017

jaceklaskowski commented Mar 29, 2017

SparkQA commented Mar 29, 2017

SparkQA commented Mar 29, 2017

srowen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 29, 2017

srowen commented Mar 30, 2017

SparkQA commented Mar 30, 2017

srowen commented Mar 30, 2017