-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] Docs-only improvements #17417
[DOCS] Docs-only improvements #17417
Conversation
Test build #75180 has finished for PR 17417 at commit
|
Test build #75183 has finished for PR 17417 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure any of these changes are necessary?
@@ -22,7 +22,7 @@ import org.apache.spark.sql.Column | |||
import org.apache.spark.sql.catalyst.expressions._ | |||
|
|||
/** | |||
* Utility functions for defining window in DataFrames. | |||
* Utility functions for defining window in Datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are used with DataFrames, right? At least that's what I have used Window for, am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'm going to revert the changes. There's a little value in them. I'd rather see the changes approved in general than fight for DataFrame vs Dataset.
p.s. There's no DataFrame
in Spark SQL which is just a type alias of Dataset[Row]
-- see https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/package.scala#L46.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an alias, yes, but it certainly exists as a user-facing type.
@@ -113,12 +113,12 @@ object Window { | |||
* Creates a [[WindowSpec]] with the frame boundaries defined, | |||
* from `start` (inclusive) to `end` (inclusive). | |||
* | |||
* Both `start` and `end` are relative positions from the current row. For example, "0" means | |||
* Both `start` and `end` are relative positions to the current row. For example, "0" means |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right phrasing is: "are positions relative to the current row". The current text is OK IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it to be more accurate (that's the purpose of this particular change so the more accurate the merrier). Thanks!
* "current row", while "-1" means the row before the current row, and "5" means the fifth row | ||
* after the current row. | ||
* | ||
* We recommend users use `Window.unboundedPreceding`, `Window.unboundedFollowing`, | ||
* and `Window.currentRow` to specify special boundary values, rather than using integral | ||
* We recommend users to use [[Window.unboundedPreceding]], [[Window.unboundedFollowing]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We recommend that users use" is correct, but 'that' can be omitted and it's still correct.
I think the backticks are on purpose as many scaladoc refs like this also cause doc failures. At least you need to verify this before changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving 'that' aside is incorrect -- see http://dictionary.cambridge.org/dictionary/english/recommend where to is even highlighted to make the point.
* sum('id) over Window.partitionBy('category).orderBy('id).rowsBetween(0,1)) | ||
* .show() | ||
* val byCategoryOrderedById = | ||
* Window.partitionBy('category).orderBy('id).rowsBetween(Window.currentRow, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? 0 should mean current row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the above change where Spark devs "recommend that users use" the values by their aliases not their numeric values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, and I think there are also doc examples like this in Column.scala
and WindowSpec.scala
that could be similarly improved
Test build #75217 has finished for PR 17417 at commit
|
* while "-1" means one off before the current row, and "5" means the five off after the | ||
* current row. | ||
* | ||
* We recommend users use `Window.unboundedPreceding`, `Window.unboundedFollowing`, | ||
* We recommend that users use `Window.unboundedPreceding`, `Window.unboundedFollowing`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is correct, it wasn't wrong
@@ -200,9 +200,9 @@ object Window { | |||
* }}} | |||
* | |||
* @param start boundary start, inclusive. The frame is unbounded if this is | |||
* the minimum long value (`Window.unboundedPreceding`). | |||
* the minimum long value, i.e. `Window.unboundedPreceding`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise this is effectively identical. I wouldn't make changes like this
Test build #75240 has finished for PR 17417 at commit
|
Hey @srowen Would appreciate your looking at the changes again and comments (or merge). Thanks! |
How about the other files I mentioned? I think they can take similar changes. I think you can roll your other PR into this. They're both kinda misc doc improvements. |
I'm going to merge the two PRs with your comments applied (i.e. excluding changes that are not necessarily doc-only). Thanks a lot for your time, Sean. Appreciate a lot. |
bf82dc6
to
802c150
Compare
Test build #75352 has finished for PR 17417 at commit
|
Test build #75353 has finished for PR 17417 at commit
|
Executed @srowen Ready to review the changes once more? Thanks. |
Test build #75354 has finished for PR 17417 at commit
|
Test build #75355 has finished for PR 17417 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this looks fine to me, but I have a few questions still about the changes.
@@ -52,16 +52,15 @@ | |||
* This class implements sort-based shuffle's hash-style shuffle fallback path. This write path | |||
* writes incoming records to separate files, one file per reduce partition, then concatenates these | |||
* per-partition files to form a single output file, regions of which are served to reducers. | |||
* Records are not buffered in memory. This is essentially identical to | |||
* {@link org.apache.spark.shuffle.hash.HashShuffleWriter}, except that it writes output in a format | |||
* Records are not buffered in memory. It writes output in a format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this particular comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashShuffleWriter
is long gone.
@@ -75,7 +75,6 @@ case class WindowSpecDefinition( | |||
frameSpecification.isInstanceOf[SpecifiedWindowFrame] | |||
|
|||
override def nullable: Boolean = true | |||
override def foldable: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that this is redundant, or happens to be right now, but I don't think I'd remove it in a docs-only change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Reverting...
@@ -26,7 +26,8 @@ import org.apache.spark.sql.types._ | |||
import org.apache.spark.unsafe.types.CalendarInterval | |||
|
|||
/** | |||
* Test basic expression parsing. If a type of expression is supported it should be tested here. | |||
* Test basic expression parsing. | |||
* If the type of an expression is supported it should be tested here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a no-op change, I'd avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost. I replaced a
with the
and added an
before expression
.
@@ -60,7 +60,7 @@ import org.apache.spark.util.Utils | |||
* The builder can also be used to create a new session: | |||
* | |||
* {{{ | |||
* SparkSession.builder() | |||
* SparkSession.builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for consistency? it also seems not worth changing otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency (and one of the recommended coding styles of mine).
f21f434
to
913dbb8
Compare
Test build #75368 has finished for PR 17417 at commit
|
Looks good, just needs a rebase now |
913dbb8
to
ae57b33
Compare
Test build #75388 has finished for PR 17417 at commit
|
Merged to master |
…adoc
What changes were proposed in this pull request?
Use recommended values for row boundaries in Window's scaladoc, i.e.
Window.unboundedPreceding
,Window.unboundedFollowing
, andWindow.currentRow
(that were introduced in 2.1.0).How was this patch tested?
Local build