-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23062][SQL] Improve EXCEPT documentation #20254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update R document, too?
Done, thanks for pointing that out! |
R/pkg/R/DataFrame.R
Outdated
@@ -2873,6 +2873,7 @@ setMethod("intersect", | |||
#' @rdname except | |||
#' @export | |||
#' @note except since 1.4.0 | |||
#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur.. I think we have use @note
for version specification in SparkR. Just adding
Note: blabla
should be fine like other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind it either way, but to note:
- r doc order and whitespace is significant, if you use
#' Note:
you must put it after L2856, if you put an extra#'
ie. empty line that it becomes theDetails
section, which might be the right place; see http://spark.apache.org/docs/latest/api/R/awaitTermination.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie.
#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT DISTINCT} in SQL.
#'
#' Note: Before Spark 2.0.0, the behavior was equivalent to `EXCEPT ALL` in SQL.
#'
#' @param x a SparkDataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. See my previous comment.
python/pyspark/sql/dataframe.py
Outdated
This is equivalent to `EXCEPT` in SQL. | ||
This is equivalent to `EXCEPT DISTINCT` in SQL. | ||
|
||
(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT ALL` in SQL.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PySpark, we can use .. note::
. This makes the doc pretty :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2.0
to 2.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, before 2.0, it is not equivalent to EXCEPT ALL. For details, see the PR: #12736
Test build #86067 has finished for PR 20254 at commit
|
Test build #86068 has finished for PR 20254 at commit
|
Also, cc @gatorsmile . |
I see, from reading that PR I think perhaps we should reference migration guide in sql programming guide instead of putting the whole description here.
|
Yeah, we should document the behavior changes, but that was just a bug fix for 100% following the semantics of ANSI-SQL |
@henryr could you update this PR to only include EXCEPT DISTINCT without the notes |
Just FYI, in ANSI SQL, |
## What changes were proposed in this pull request? Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more explicit in the documentation.
Thanks all for the pointers and feedback! I've removed the references to the behavior before 2.0, and now the changes just make it explicit that this is |
@henryr Since Spark 2.3, Spark SQL documents all the behavior changes in Migration Guides. Hopefully, this can help our end users. |
LGTM |
Test build #86211 has finished for PR 20254 at commit
|
## What changes were proposed in this pull request? Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more explicit in the documentation, and call out the change in behavior from 1.x. Author: Henry Robinson <[email protected]> Closes #20254 from henryr/spark-23062. (cherry picked from commit 1f3d933) Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.