Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Docs-only improvements #17417

Closed

Conversation

jaceklaskowski
Copy link
Contributor

…adoc

What changes were proposed in this pull request?

Use recommended values for row boundaries in Window's scaladoc, i.e. Window.unboundedPreceding, Window.unboundedFollowing, and Window.currentRow (that were introduced in 2.1.0).

How was this patch tested?

Local build

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75180 has finished for PR 17417 at commit efc420b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2017

Test build #75183 has finished for PR 17417 at commit 07a36f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure any of these changes are necessary?

@@ -22,7 +22,7 @@ import org.apache.spark.sql.Column
import org.apache.spark.sql.catalyst.expressions._

/**
* Utility functions for defining window in DataFrames.
* Utility functions for defining window in Datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are used with DataFrames, right? At least that's what I have used Window for, am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'm going to revert the changes. There's a little value in them. I'd rather see the changes approved in general than fight for DataFrame vs Dataset.

p.s. There's no DataFrame in Spark SQL which is just a type alias of Dataset[Row] -- see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/package.scala#L46.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an alias, yes, but it certainly exists as a user-facing type.

@@ -113,12 +113,12 @@ object Window {
* Creates a [[WindowSpec]] with the frame boundaries defined,
* from `start` (inclusive) to `end` (inclusive).
*
* Both `start` and `end` are relative positions from the current row. For example, "0" means
* Both `start` and `end` are relative positions to the current row. For example, "0" means
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right phrasing is: "are positions relative to the current row". The current text is OK IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it to be more accurate (that's the purpose of this particular change so the more accurate the merrier). Thanks!

* "current row", while "-1" means the row before the current row, and "5" means the fifth row
* after the current row.
*
* We recommend users use `Window.unboundedPreceding`, `Window.unboundedFollowing`,
* and `Window.currentRow` to specify special boundary values, rather than using integral
* We recommend users to use [[Window.unboundedPreceding]], [[Window.unboundedFollowing]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We recommend that users use" is correct, but 'that' can be omitted and it's still correct.
I think the backticks are on purpose as many scaladoc refs like this also cause doc failures. At least you need to verify this before changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving 'that' aside is incorrect -- see http://dictionary.cambridge.org/dictionary/english/recommend where to is even highlighted to make the point.

* sum('id) over Window.partitionBy('category).orderBy('id).rowsBetween(0,1))
* .show()
* val byCategoryOrderedById =
* Window.partitionBy('category).orderBy('id).rowsBetween(Window.currentRow, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? 0 should mean current row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the above change where Spark devs "recommend that users use" the values by their aliases not their numeric values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, and I think there are also doc examples like this in Column.scala and WindowSpec.scala that could be similarly improved

@SparkQA
Copy link

SparkQA commented Mar 25, 2017

Test build #75217 has finished for PR 17417 at commit 07001a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* while "-1" means one off before the current row, and "5" means the five off after the
* current row.
*
* We recommend users use `Window.unboundedPreceding`, `Window.unboundedFollowing`,
* We recommend that users use `Window.unboundedPreceding`, `Window.unboundedFollowing`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is correct, it wasn't wrong

@@ -200,9 +200,9 @@ object Window {
* }}}
*
* @param start boundary start, inclusive. The frame is unbounded if this is
* the minimum long value (`Window.unboundedPreceding`).
* the minimum long value, i.e. `Window.unboundedPreceding`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise this is effectively identical. I wouldn't make changes like this

@SparkQA
Copy link

SparkQA commented Mar 26, 2017

Test build #75240 has finished for PR 17417 at commit bf82dc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jaceklaskowski
Copy link
Contributor Author

Hey @srowen Would appreciate your looking at the changes again and comments (or merge). Thanks!

@srowen
Copy link
Member

srowen commented Mar 27, 2017

How about the other files I mentioned? I think they can take similar changes. I think you can roll your other PR into this. They're both kinda misc doc improvements.

@jaceklaskowski
Copy link
Contributor Author

I'm going to merge the two PRs with your comments applied (i.e. excluding changes that are not necessarily doc-only). Thanks a lot for your time, Sean. Appreciate a lot.

@jaceklaskowski jaceklaskowski changed the title [SQL][DOC] Use recommended values for row boundaries in Window's scal… [DOC] Doc-only improvements Mar 29, 2017
@jaceklaskowski jaceklaskowski changed the title [DOC] Doc-only improvements [DOCS] Doc-only improvements Mar 29, 2017
@jaceklaskowski jaceklaskowski changed the title [DOCS] Doc-only improvements [DOCS] Docs-only improvements Mar 29, 2017
@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75352 has finished for PR 17417 at commit 0c4a77e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait Source

@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75353 has finished for PR 17417 at commit 8dc1f04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jaceklaskowski
Copy link
Contributor Author

Executed cd docs && SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve to check the changes and they've seemed fine. I had to fix some extra javadoc-related places to please jekyll.

@srowen Ready to review the changes once more? Thanks.

@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75354 has finished for PR 17417 at commit e09802d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75355 has finished for PR 17417 at commit db426e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this looks fine to me, but I have a few questions still about the changes.

@@ -52,16 +52,15 @@
* This class implements sort-based shuffle's hash-style shuffle fallback path. This write path
* writes incoming records to separate files, one file per reduce partition, then concatenates these
* per-partition files to form a single output file, regions of which are served to reducers.
* Records are not buffered in memory. This is essentially identical to
* {@link org.apache.spark.shuffle.hash.HashShuffleWriter}, except that it writes output in a format
* Records are not buffered in memory. It writes output in a format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this particular comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashShuffleWriter is long gone.

@@ -75,7 +75,6 @@ case class WindowSpecDefinition(
frameSpecification.isInstanceOf[SpecifiedWindowFrame]

override def nullable: Boolean = true
override def foldable: Boolean = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that this is redundant, or happens to be right now, but I don't think I'd remove it in a docs-only change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Reverting...

@@ -26,7 +26,8 @@ import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.CalendarInterval

/**
* Test basic expression parsing. If a type of expression is supported it should be tested here.
* Test basic expression parsing.
* If the type of an expression is supported it should be tested here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a no-op change, I'd avoid this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost. I replaced a with the and added an before expression.

@@ -60,7 +60,7 @@ import org.apache.spark.util.Utils
* The builder can also be used to create a new session:
*
* {{{
* SparkSession.builder()
* SparkSession.builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for consistency? it also seems not worth changing otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency (and one of the recommended coding styles of mine).

@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75368 has finished for PR 17417 at commit 913dbb8.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 30, 2017

Looks good, just needs a rebase now

@SparkQA
Copy link

SparkQA commented Mar 30, 2017

Test build #75388 has finished for PR 17417 at commit ae57b33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 30, 2017

Merged to master

@asfgit asfgit closed this in 0197262 Mar 30, 2017
@jaceklaskowski jaceklaskowski deleted the window-expression-scaladoc branch March 30, 2017 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants