Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23062][SQL] Improve EXCEPT documentation #20254

Closed
wants to merge 1 commit into from

Conversation

henryr
Copy link
Contributor

@henryr henryr commented Jan 13, 2018

What changes were proposed in this pull request?

Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update R document, too?

@henryr
Copy link
Contributor Author

henryr commented Jan 13, 2018

Done, thanks for pointing that out!

@@ -2873,6 +2873,7 @@ setMethod("intersect",
#' @rdname except
#' @export
#' @note except since 1.4.0
#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur.. I think we have use @note for version specification in SparkR. Just adding

Note: blabla

should be fine like other places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind it either way, but to note:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie.

#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT DISTINCT} in SQL.
#'
#' Note: Before Spark 2.0.0, the behavior was equivalent to `EXCEPT ALL` in SQL.
#'
#' @param x a SparkDataFrame.

Copy link
Member

@gatorsmile gatorsmile Jan 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. See my previous comment.

This is equivalent to `EXCEPT` in SQL.
This is equivalent to `EXCEPT DISTINCT` in SQL.

(Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT ALL` in SQL.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PySpark, we can use .. note:: . This makes the doc pretty :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2.0 to 2.0.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, before 2.0, it is not equivalent to EXCEPT ALL. For details, see the PR: #12736

@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86067 has finished for PR 20254 at commit 9fe5707.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86068 has finished for PR 20254 at commit 5562a16.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Also, cc @gatorsmile .

@felixcheung
Copy link
Member

felixcheung commented Jan 13, 2018 via email

@gatorsmile
Copy link
Member

Yeah, we should document the behavior changes, but that was just a bug fix for 100% following the semantics of ANSI-SQL EXCEPT DISTINCT

@felixcheung
Copy link
Member

@henryr could you update this PR to only include EXCEPT DISTINCT without the notes

@gatorsmile
Copy link
Member

Just FYI, in ANSI SQL, EXCEPT = EXCEPT DISTINCT

## What changes were proposed in this pull request?

Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation.
@henryr
Copy link
Contributor Author

henryr commented Jan 16, 2018

Thanks all for the pointers and feedback! I've removed the references to the behavior before 2.0, and now the changes just make it explicit that this is EXCEPT DISTINCT (I appreciate that that's the meaning of EXCEPT per ANSI, but the behavior change since 1.x has confused users I've spoken to so seems worthwhile to make the documentation as clear as possible).

@gatorsmile
Copy link
Member

@henryr Since Spark 2.3, Spark SQL documents all the behavior changes in Migration Guides. Hopefully, this can help our end users.

@gatorsmile
Copy link
Member

LGTM

@SparkQA
Copy link

SparkQA commented Jan 17, 2018

Test build #86211 has finished for PR 20254 at commit 0b51997.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jan 17, 2018
## What changes were proposed in this pull request?

Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.

Author: Henry Robinson <[email protected]>

Closes #20254 from henryr/spark-23062.

(cherry picked from commit 1f3d933)
Signed-off-by: gatorsmile <[email protected]>
@asfgit asfgit closed this in 1f3d933 Jan 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants