[SPARK-23062][SQL] Improve EXCEPT documentation #20254

henryr · 2018-01-13T00:25:01Z

What changes were proposed in this pull request?

Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.

dongjoon-hyun

Could you update R document, too?

henryr · 2018-01-13T01:12:48Z

Done, thanks for pointing that out!

HyukjinKwon · 2018-01-13T01:24:27Z

R/pkg/R/DataFrame.R

@@ -2873,6 +2873,7 @@ setMethod("intersect",
 #' @rdname except
 #' @export
 #' @note except since 1.4.0
+#' @note behaviour changed from \code{EXCEPT ALL} to \code{EXCEPT DISTINCT} in 2.0.


Ur.. I think we have use @note for version specification in SparkR. Just adding

Note: blabla

should be fine like other places.

I don't mind it either way, but to note:

r doc order and whitespace is significant, if you use #' Note: you must put it after L2856, if you put an extra #' ie. empty line that it becomes the Details section, which might be the right place; see http://spark.apache.org/docs/latest/api/R/awaitTermination.html

ie.

#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT DISTINCT} in SQL. #' #' Note: Before Spark 2.0.0, the behavior was equivalent to `EXCEPT ALL` in SQL. #' #' @param x a SparkDataFrame.

This is wrong. See my previous comment.

HyukjinKwon · 2018-01-13T01:26:06Z

python/pyspark/sql/dataframe.py

-        This is equivalent to `EXCEPT` in SQL.
+        This is equivalent to `EXCEPT DISTINCT` in SQL.
+
+        (Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT ALL` in SQL.)


In PySpark, we can use .. note:: . This makes the doc pretty :).

nit: 2.0 to 2.0.0

Actually, before 2.0, it is not equivalent to EXCEPT ALL. For details, see the PR: #12736

SparkQA · 2018-01-13T03:49:27Z

Test build #86067 has finished for PR 20254 at commit 9fe5707.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-13T05:00:28Z

Test build #86068 has finished for PR 20254 at commit 5562a16.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-01-13T16:26:30Z

Also, cc @gatorsmile .

felixcheung · 2018-01-13T17:55:05Z

I see, from reading that PR I think perhaps we should reference migration guide in sql programming guide instead of putting the whole description here.

gatorsmile · 2018-01-13T18:16:39Z

Yeah, we should document the behavior changes, but that was just a bug fix for 100% following the semantics of ANSI-SQL EXCEPT DISTINCT

felixcheung · 2018-01-14T06:58:47Z

@henryr could you update this PR to only include EXCEPT DISTINCT without the notes

gatorsmile · 2018-01-14T10:02:01Z

Just FYI, in ANSI SQL, EXCEPT = EXCEPT DISTINCT

## What changes were proposed in this pull request? Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more explicit in the documentation.

henryr · 2018-01-16T23:45:27Z

Thanks all for the pointers and feedback! I've removed the references to the behavior before 2.0, and now the changes just make it explicit that this is EXCEPT DISTINCT (I appreciate that that's the meaning of EXCEPT per ANSI, but the behavior change since 1.x has confused users I've spoken to so seems worthwhile to make the documentation as clear as possible).

gatorsmile · 2018-01-17T02:39:03Z

@henryr Since Spark 2.3, Spark SQL documents all the behavior changes in Migration Guides. Hopefully, this can help our end users.

gatorsmile · 2018-01-17T02:39:30Z

LGTM

SparkQA · 2018-01-17T02:57:52Z

Test build #86211 has finished for PR 20254 at commit 0b51997.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more explicit in the documentation, and call out the change in behavior from 1.x. Author: Henry Robinson <[email protected]> Closes #20254 from henryr/spark-23062. (cherry picked from commit 1f3d933) Signed-off-by: gatorsmile <[email protected]>

dongjoon-hyun reviewed Jan 13, 2018

View reviewed changes

henryr force-pushed the spark-23062 branch from 9fe5707 to 5562a16 Compare January 13, 2018 01:12

HyukjinKwon reviewed Jan 13, 2018

View reviewed changes

[SPARK-23062][SQL] Improve EXCEPT documentation

0b51997

## What changes were proposed in this pull request? Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more explicit in the documentation.

henryr force-pushed the spark-23062 branch from 5562a16 to 0b51997 Compare January 16, 2018 23:45

asfgit closed this in 1f3d933 Jan 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23062][SQL] Improve EXCEPT documentation #20254

[SPARK-23062][SQL] Improve EXCEPT documentation #20254

henryr commented Jan 13, 2018

dongjoon-hyun left a comment

henryr commented Jan 13, 2018

HyukjinKwon Jan 13, 2018

felixcheung Jan 13, 2018

felixcheung Jan 13, 2018

gatorsmile Jan 13, 2018 •

edited

Loading

HyukjinKwon Jan 13, 2018

felixcheung Jan 13, 2018

gatorsmile Jan 13, 2018

SparkQA commented Jan 13, 2018

SparkQA commented Jan 13, 2018

dongjoon-hyun commented Jan 13, 2018

felixcheung commented Jan 13, 2018 via email

gatorsmile commented Jan 13, 2018

felixcheung commented Jan 14, 2018

gatorsmile commented Jan 14, 2018

henryr commented Jan 16, 2018

gatorsmile commented Jan 17, 2018

gatorsmile commented Jan 17, 2018

SparkQA commented Jan 17, 2018

[SPARK-23062][SQL] Improve EXCEPT documentation #20254

[SPARK-23062][SQL] Improve EXCEPT documentation #20254

Conversation

henryr commented Jan 13, 2018

What changes were proposed in this pull request?

dongjoon-hyun left a comment

Choose a reason for hiding this comment

henryr commented Jan 13, 2018

HyukjinKwon Jan 13, 2018

Choose a reason for hiding this comment

felixcheung Jan 13, 2018

Choose a reason for hiding this comment

felixcheung Jan 13, 2018

Choose a reason for hiding this comment

gatorsmile Jan 13, 2018 • edited Loading

Choose a reason for hiding this comment

HyukjinKwon Jan 13, 2018

Choose a reason for hiding this comment

felixcheung Jan 13, 2018

Choose a reason for hiding this comment

gatorsmile Jan 13, 2018

Choose a reason for hiding this comment

SparkQA commented Jan 13, 2018

SparkQA commented Jan 13, 2018

dongjoon-hyun commented Jan 13, 2018

felixcheung commented Jan 13, 2018 via email

gatorsmile commented Jan 13, 2018

felixcheung commented Jan 14, 2018

gatorsmile commented Jan 14, 2018

henryr commented Jan 16, 2018

gatorsmile commented Jan 17, 2018

gatorsmile commented Jan 17, 2018

SparkQA commented Jan 17, 2018

gatorsmile Jan 13, 2018 •

edited

Loading