Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 2.3 #21444

Closed
wants to merge 485 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
485 commits
Select commit Hold shift + click to select a range
07a8f4d
[SPARK-23293][SQL] fix data source v2 self join
cloud-fan Feb 1, 2018
ab23785
[SPARK-23296][YARN] Include stacktrace in YARN-app diagnostic
gerashegalov Feb 1, 2018
7baae3a
[SPARK-23284][SQL] Document the behavior of several ColumnVector's ge…
viirya Feb 2, 2018
2b07452
[SPARK-23301][SQL] data source column pruning should work for arbitra…
cloud-fan Feb 2, 2018
e5e9f9a
[SPARK-23312][SQL] add a config to turn off vectorized cache reader
cloud-fan Feb 2, 2018
56eb9a3
[SPARK-23064][SS][DOCS] Stream-stream joins Documentation - follow up
tdas Feb 3, 2018
dcd0af4
[SQL] Minor doc update: Add an example in DataFrameReader.schema
rxin Feb 3, 2018
b614c08
[SPARK-23317][SQL] rename ContinuousReader.setOffset to setStartOffset
cloud-fan Feb 3, 2018
1bcb372
[SPARK-23311][SQL][TEST] add FilterFunction test case for test Combin…
heary-cao Feb 3, 2018
4de2061
[SPARK-23305][SQL][TEST] Test `spark.sql.files.ignoreMissingFiles` fo…
dongjoon-hyun Feb 3, 2018
be3de87
[MINOR][DOC] Use raw triple double quotes around docstrings where the…
ashashwat Feb 3, 2018
45f0f4f
[SPARK-21658][SQL][PYSPARK] Revert "[] Add default None for value in …
HyukjinKwon Feb 3, 2018
430025c
[SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeticOperations.sql
wangyum Feb 4, 2018
e688ffe
[SPARK-23307][WEBUI] Sort jobs/stages/tasks/queries with the complete…
zsxwing Feb 5, 2018
173449c
[SPARK-23310][CORE] Turn off read ahead input stream for unshafe shuf…
Feb 5, 2018
4aa9aaf
[SPARK-23330][WEBUI] Spark UI SQL executions page throws NPE
jiangxb1987 Feb 5, 2018
521494d
[SPARK-23326][WEBUI] schedulerDelay should return 0 when the task is …
zsxwing Feb 6, 2018
4493303
[SPARK-23290][SQL][PYTHON][BACKPORT-2.3] Use datetime.date for date t…
ueshin Feb 6, 2018
a511544
[SPARK-23334][SQL][PYTHON] Fix pandas_udf with return type StringType…
ueshin Feb 6, 2018
7782fd0
[SPARK-23310][CORE][FOLLOWUP] Fix Java style check issues.
ueshin Feb 6, 2018
036a04b
[SPARK-23312][SQL][FOLLOWUP] add a config to turn off vectorized cach…
cloud-fan Feb 6, 2018
77cccc5
[MINOR][TEST] Fix class name for Pandas UDF tests
icexelloss Feb 6, 2018
f9c9132
[SPARK-23315][SQL] failed to get output from canonicalized data sourc…
cloud-fan Feb 6, 2018
874d3f8
[SPARK-23327][SQL] Update the description and tests of three external…
gatorsmile Feb 7, 2018
cb22e83
[SPARK-23122][PYSPARK][FOLLOWUP] Replace registerTempTable by createO…
gatorsmile Feb 7, 2018
05239af
[SPARK-23345][SQL] Remove open stream record even closing it fails
viirya Feb 7, 2018
2ba07d5
[SPARK-23300][TESTS][BRANCH-2.3] Prints out if Pandas and PyArrow are…
HyukjinKwon Feb 8, 2018
db59e55
Revert [SPARK-22279][SQL] Turn on spark.sql.hive.convertMetastoreOrc …
gatorsmile Feb 8, 2018
0538302
[SPARK-23319][TESTS][BRANCH-2.3] Explicitly specify Pandas and PyArro…
HyukjinKwon Feb 8, 2018
0c2a210
[SPARK-23348][SQL] append data using saveAsTable should adjust the da…
cloud-fan Feb 8, 2018
68f3a07
[SPARK-23268][SQL][FOLLOWUP] Reorganize packages in data source V2
cloud-fan Feb 8, 2018
dfb1614
[SPARK-23186][SQL] Initialize DriverManager first before loading JDBC…
dongjoon-hyun Feb 9, 2018
196304a
[SPARK-23328][PYTHON] Disallow default value None in na.replace/repla…
HyukjinKwon Feb 9, 2018
08eb95f
[SPARK-23358][CORE] When the number of partitions is greater than 2^2…
10110346 Feb 9, 2018
49771ac
[MINOR][HIVE] Typo fixes
jaceklaskowski Feb 10, 2018
f3a9a7f
[SPARK-23275][SQL] fix the thread leaking in hive/tests
Feb 10, 2018
b7571b9
[SPARK-23360][SQL][PYTHON] Get local timezone from environment via py…
ueshin Feb 10, 2018
9fa7b0e
[SPARK-23314][PYTHON] Add ambiguous=False when localizing tz-naive ti…
icexelloss Feb 11, 2018
8875e47
[SPARK-23387][SQL][PYTHON][TEST][BRANCH-2.3] Backport assertPandasEqu…
ueshin Feb 11, 2018
7e2a2b3
[SPARK-23376][SQL] creating UnsafeKVExternalSorter with BytesToBytesM…
cloud-fan Feb 11, 2018
79e8650
[SPARK-23390][SQL] Flaky Test Suite: FileBasedDataSourceSuite in Spar…
cloud-fan Feb 12, 2018
1e3118c
[SPARK-22977][SQL] fix web UI SQL tab for CTAS
cloud-fan Feb 12, 2018
d31c4ae
[SPARK-23391][CORE] It may lead to overflow for some integer multipli…
10110346 Feb 12, 2018
89f6fcb
Preparing Spark release v2.3.0-rc3
sameeragarwal Feb 12, 2018
70be603
Preparing development version 2.3.1-SNAPSHOT
sameeragarwal Feb 12, 2018
4e13820
[SPARK-23388][SQL] Support for Parquet Binary DecimalType in Vectoriz…
jamesthomp Feb 12, 2018
9632c46
[SPARK-22002][SQL][FOLLOWUP][TEST] Add a test to check if the origina…
ueshin Feb 12, 2018
2b80571
[SPARK-23313][DOC] Add a migration guide for ORC
dongjoon-hyun Feb 12, 2018
befb22d
[SPARK-23230][SQL] When hive.default.fileformat is other kinds of fil…
cxzl25 Feb 13, 2018
43f5e40
[SPARK-23352][PYTHON][BRANCH-2.3] Explicitly specify supported types …
HyukjinKwon Feb 13, 2018
3737c3d
[SPARK-20090][FOLLOW-UP] Revert the deprecation of `names` in PySpark
gatorsmile Feb 13, 2018
1c81c0c
[SPARK-23384][WEB-UI] When it has no incomplete(completed) applicatio…
Feb 13, 2018
dbb1b39
[SPARK-23053][CORE] taskBinarySerialization and task partitions calcu…
Feb 13, 2018
ab01ba7
[SPARK-23316][SQL] AnalysisException after max iteration reached for …
bogdanrdc Feb 13, 2018
320ffb1
[SPARK-23154][ML][DOC] Document backwards compatibility guarantees fo…
jkbradley Feb 13, 2018
4f6a457
[SPARK-23400][SQL] Add a constructors for ScalaUDF
gatorsmile Feb 13, 2018
bb26bdb
[SPARK-23399][SQL] Register a task completion listener first for OrcC…
dongjoon-hyun Feb 14, 2018
fd66a3b
[SPARK-23394][UI] In RDD storage page show the executor addresses ins…
attilapiros Feb 14, 2018
a5a8a86
Revert "[SPARK-23249][SQL] Improved block merging logic for partitions"
gatorsmile Feb 14, 2018
bd83f7b
[SPARK-23421][SPARK-22356][SQL] Document the behavior change in
gatorsmile Feb 15, 2018
129fd45
[SPARK-23094] Revert [] Fix invalid character handling in JsonDataSource
gatorsmile Feb 15, 2018
f2c0585
[SPARK-23419][SPARK-23416][SS] data source v2 write path should re-th…
cloud-fan Feb 15, 2018
d24d131
[SPARK-23422][CORE] YarnShuffleIntegrationSuite fix when SPARK_PREPEN…
gaborgsomogyi Feb 15, 2018
bae4449
[SPARK-23426][SQL] Use `hive` ORC impl and disable PPD for Spark 2.3.0
dongjoon-hyun Feb 15, 2018
03960fa
[MINOR][SQL] Fix an error message about inserting into bucketed tables
dongjoon-hyun Feb 15, 2018
0bd7765
[SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug
viirya Feb 15, 2018
75bb19a
[SPARK-23413][UI] Fix sorting tasks by Host / Executor ID at the Stag…
attilapiros Feb 15, 2018
ccb0a59
[SPARK-23446][PYTHON] Explicitly check supported types in toPandas
HyukjinKwon Feb 16, 2018
8360da0
[SPARK-23381][CORE] Murmur3 hash generates a different value from oth…
mrkm4ntr Feb 17, 2018
44095cb
Preparing Spark release v2.3.0-rc4
sameeragarwal Feb 17, 2018
c7a0dea
Preparing development version 2.3.1-SNAPSHOT
sameeragarwal Feb 17, 2018
a1ee6f1
[SPARK-23470][UI] Use first attempt of last stage to define job descr…
Feb 21, 2018
1d78f03
[SPARK-23468][CORE] Stringify auth secret before storing it in creden…
Feb 21, 2018
3e7269e
[SPARK-23454][SS][DOCS] Added trigger information to the Structured S…
tdas Feb 21, 2018
373ac64
[SPARK-23484][SS] Fix possible race condition in KafkaContinuousReader
tdas Feb 21, 2018
23ba441
[SPARK-23481][WEBUI] lastStageAttempt should fail when a stage doesn'…
zsxwing Feb 21, 2018
a0d7949
[SPARK-23475][WEBUI] Skipped stages should be evicted before complete…
zsxwing Feb 22, 2018
992447f
Preparing Spark release v2.3.0-rc5
sameeragarwal Feb 22, 2018
285b841
Preparing development version 2.3.1-SNAPSHOT
sameeragarwal Feb 22, 2018
578607b
[SPARK-23475][UI][BACKPORT-2.3] Show also skipped stages
mgaido91 Feb 24, 2018
1f180cd
[SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver c…
gaborgsomogyi Feb 26, 2018
6eee545
[SPARK-23449][K8S] Preserve extraJavaOptions ordering
andrusha Feb 26, 2018
30242b6
[SPARK-23365][CORE] Do not adjust num executors when killing idle exe…
squito Feb 27, 2018
fe9cb4a
[SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document
viirya Feb 28, 2018
dfa4379
[SPARK-23508][CORE] Fix BlockmanagerId in case blockManagerIdCache ca…
caneGuy Feb 28, 2018
a4eb1e4
[SPARK-23517][PYTHON] Make `pyspark.util._exception_message` produce …
HyukjinKwon Feb 28, 2018
2aa66eb
[SPARK-23551][BUILD] Exclude `hadoop-mapreduce-client-core` dependenc…
dongjoon-hyun Mar 2, 2018
56cfbd9
[SPARK-22883][ML][TEST] Streaming tests for spark.ml.feature, from A …
jkbradley Mar 2, 2018
8fe20e1
[SPARKR][DOC] fix link in vignettes
felixcheung Mar 2, 2018
f12fa13
[SPARK-23570][SQL] Add Spark 2.3.0 in HiveExternalCatalogVersionsSuite
gatorsmile Mar 2, 2018
26a8a67
[SQL][MINOR] XPathDouble prettyPrint should say 'double' not 'float'
ericl Mar 4, 2018
c8aa6fb
[SPARK-23569][PYTHON] Allow pandas_udf to work with python3 style typ…
mstewart141 Mar 5, 2018
88dd335
[MINOR][DOCS] Fix a link in "Compatibility with Apache Hive"
HyukjinKwon Mar 5, 2018
232b9f8
[SPARK-23329][SQL] Fix documentation of trigonometric functions
misutoth Mar 5, 2018
4550673
[SPARK-22882][ML][TESTS] ML test for structured streaming: ml.classif…
WeichenXu123 Mar 5, 2018
911b83d
[SPARK-23457][SQL][BRANCH-2.3] Register task completion listeners fir…
dongjoon-hyun Mar 5, 2018
b9ea2e8
[SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `metadata direct…
dongjoon-hyun Mar 5, 2018
66c1978
[SPARK-23601][BUILD] Remove .md5 files from release
srowen Mar 6, 2018
8cd6a96
[SPARK-23525][SQL] Support ALTER TABLE CHANGE COLUMN COMMENT for exte…
jiangxb1987 Mar 7, 2018
ee6e797
[SPARK-23020][CORE][BRANCH-2.3] Fix another race in the in-process la…
Mar 8, 2018
86ca915
[SPARK-23524] Big local shuffle blocks should not be checked for corr…
Mar 8, 2018
1dd37ff
[SPARK-23490][BACKPORT][SQL] Check storage.locationUri with existing …
gengliangwang Mar 8, 2018
404f7e2
[SPARK-23406][SS] Enable stream-stream self-joins for branch-2.3
tdas Mar 8, 2018
8ff8e16
[SPARK-23436][SQL][BACKPORT-2.3] Infer partition as Date only if it c…
mgaido91 Mar 9, 2018
bc5ce04
[SPARK-23630][YARN] Allow user's hadoop conf customizations to take e…
Mar 9, 2018
3ec25d5
[SPARK-23628][SQL][BACKPORT-2.3] calculateParamLength should not retu…
mgaido91 Mar 9, 2018
b083bd1
[SPARK-23173][SQL] Avoid creating corrupt parquet files when loading …
mswit-databricks Mar 9, 2018
5bd306c
[SPARK-23624][SQL] Revise doc of method pushFilters in Datasource V2
gengliangwang Mar 9, 2018
265e61e
[PYTHON] Changes input variable to not conflict with built-in function
DylanGuedes Mar 10, 2018
a8e357a
[SPARK-23462][SQL] improve missing field error message in `StructType`
xysun Mar 12, 2018
33ba8db
[SPARK-23523][SQL][BACKPORT-2.3] Fix the incorrect result caused by t…
jiangxb1987 Mar 13, 2018
f3efbfa
[SPARK-23598][SQL] Make methods in BufferedRowIterator public to avoi…
kiszk Mar 13, 2018
0663b61
[SPARK-22915][MLLIB] Streaming tests for spark.ml.feature, from N to Z
attilapiros Mar 15, 2018
a9d0784
[SPARK-23642][DOCS] AccumulatorV2 subclass isZero scaladoc fix
Mar 15, 2018
72c13ed
[SPARK-23695][PYTHON] Fix the error message for Kinesis streaming tests
HyukjinKwon Mar 15, 2018
2e1e274
[SPARK-23658][LAUNCHER] InProcessAppHandle uses the wrong class in ge…
Mar 16, 2018
52a52d5
[SPARK-23671][CORE] Fix condition to enable the SHS thread pool.
Mar 16, 2018
99f5c0b
[SPARK-23608][CORE][WEBUI] Add synchronization in SHS between attachS…
zhouyejoe Mar 16, 2018
d9e1f70
[SPARK-23670][SQL] Fix memory leak on SparkPlanGraphWrapper
myroslavlisniak Mar 16, 2018
21b6de4
[SPARK-23553][TESTS] Tests should not assume the default value of `sp…
dongjoon-hyun Mar 16, 2018
6937571
[SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedK…
tdas Mar 17, 2018
80e7943
[SPARK-23706][PYTHON] spark.conf.get(value, default=None) should prod…
HyukjinKwon Mar 18, 2018
9204939
[SPARK-23728][BRANCH-2.3] Fix ML tests with expected exceptions runni…
attilapiros Mar 19, 2018
5c1c03d
[SPARK-23660] Fix exception in yarn cluster mode when application end…
gaborgsomogyi Mar 20, 2018
2f82c03
[SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path for REST call…
mgaido91 Mar 20, 2018
c854b6c
[SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in PySpark tests …
HyukjinKwon Mar 20, 2018
0b880db
[SPARK-23649][SQL] Skipping chars disallowed in UTF-8
MaxGekk Mar 20, 2018
1e552b3
[SPARK-23264][SQL] Fix scala.MatchError in literals.sql.out
maropu Mar 21, 2018
4b9f33f
[SPARK-23288][SS] Fix output metrics with parquet sink
gaborgsomogyi Mar 21, 2018
c9acd46
[SPARK-23729][CORE] Respect URI fragment when resolving globs
misutoth Mar 22, 2018
4da8c22
[SPARK-23760][SQL] CodegenContext.withSubExprEliminationExprs should …
rednaxelafx Mar 22, 2018
1d0d0a5
[SPARK-23614][SQL] Fix incorrect reuse exchange when caching is used
viirya Mar 23, 2018
45761ce
[MINOR][R] Fix R lint failure
HyukjinKwon Mar 23, 2018
ce0fbec
[SPARK-23769][CORE] Remove comments that unnecessarily disable Scalas…
arucard21 Mar 23, 2018
ea44783
[SPARK-23759][UI] Unable to bind Spark UI to specific host name / IP
Mar 23, 2018
523fcaf
[SPARK-23788][SS] Fix race in StreamingQuerySuite
jose-torres Mar 25, 2018
57026a1
[SPARK-23599][SQL] Add a UUID generator from Pseudo-Random Numbers
viirya Mar 19, 2018
2fd7aca
[HOT-FIX] Fix SparkOutOfMemoryError: Unable to acquire 262144 bytes o…
wangyum Mar 15, 2018
328dea6
[SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_udf` with keyw…
mstewart141 Mar 26, 2018
1c39dfa
[SPARK-23599][SQL][BACKPORT-2.3] Use RandomUUIDGenerator in Uuid expr…
viirya Mar 26, 2018
38c0bd7
[SPARK-23806] Broadcast.unpersist can cause fatal exception when used…
Mar 29, 2018
0bfbcaf
[SPARK-23785][LAUNCHER] LauncherBackend doesn't check state of connec…
Mar 29, 2018
5163045
[SPARK-23639][SQL] Obtain token before init metastore client in Spark…
yaooqinn Mar 29, 2018
1365d73
[SPARK-23808][SQL] Set default Spark session in test-only spark sessi…
jose-torres Mar 30, 2018
3f5955a
Revert "[SPARK-23785][LAUNCHER] LauncherBackend doesn't check state o…
Mar 30, 2018
507cff2
[SPARK-23827][SS] StreamingJoinExec should ensure that input data is …
tdas Mar 30, 2018
f1f10da
[SPARK-23040][BACKPORT][CORE] Returns interruptible iterator for shuf…
advancedxy Apr 1, 2018
6ca6483
[SPARK-19964][CORE] Avoid reading from remote repos in SparkSubmitSuite.
Apr 3, 2018
ce15651
[MINOR][DOC] Fix a few markdown typos
Apr 3, 2018
f36bdb4
[MINOR][CORE] Show block manager id when remove RDD/Broadcast fails.
jiangxb1987 Apr 3, 2018
28c9adb
[SPARK-23802][SQL] PropagateEmptyRelation can leave query plan in unr…
Apr 4, 2018
a81e203
[SPARK-23838][WEBUI] Running SQL query is displayed as "completed" in…
gengliangwang Apr 4, 2018
0b7b8cc
[SPARK-23637][YARN] Yarn might allocate more resource if a same execu…
Apr 4, 2018
f93667f
[SPARK-23823][SQL] Keep origin in transformExpression
Apr 6, 2018
ccc4a20
[SPARK-23822][SQL] Improve error message for Parquet schema mismatches
yuchenhuo Apr 6, 2018
1a537a2
[SPARK-23809][SQL][BACKPORT] Active SparkSession should be set by get…
ericl Apr 8, 2018
bf1dabe
[SPARK-23881][CORE][TEST] Fix flaky test JobCancellationSuite."interr…
jiangxb1987 Apr 9, 2018
0f2aabc
[SPARK-23816][CORE] Killed tasks should ignore FetchFailures.
squito Apr 9, 2018
320269e
[MINOR][DOCS] Fix R documentation generation instruction for roxygen2
HyukjinKwon Apr 11, 2018
acfc156
[SPARK-22883][ML] ML test for StructuredStreaming: spark.ml.feature, I-M
jkbradley Apr 11, 2018
03a4dfd
typo rawPredicition changed to rawPrediction
JBauerKogentix Apr 11, 2018
5712695
[SPARK-23962][SQL][TEST] Fix race in currentExecutionIds().
squito Apr 12, 2018
908c681
[SPARK-23867][SCHEDULER] use droppedCount in logWarning
Apr 13, 2018
2995b79
[SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryA…
jerryshao Apr 13, 2018
dfdf1bb
[SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may…
Apr 13, 2018
d4f204c
[SPARK-23942][PYTHON][SQL][BRANCH-2.3] Makes collect in PySpark as ac…
HyukjinKwon Apr 14, 2018
9857e24
[SPARK-23835][SQL] Add not-null check to Tuples' arguments deserializ…
mgaido91 Apr 17, 2018
564019b
[SPARK-23986][SQL] freshName can generate non-unique names
mgaido91 Apr 17, 2018
6b99d5b
[SPARK-23948] Trigger mapstage's job listener in submitMissingTasks
Apr 17, 2018
a1c56b6
[SPARK-24007][SQL] EqualNullSafe for FloatType and DoubleType might g…
ueshin Apr 18, 2018
5bcb7bd
[SPARK-23963][SQL] Properly handle large number of columns in query o…
bersprockets Apr 13, 2018
1306411
[SPARK-23775][TEST] Make DataFrameRangeSuite not flaky
gaborgsomogyi Apr 18, 2018
32bec6c
[SPARK-24014][PYSPARK] Add onStreamingStarted method to StreamingList…
viirya Apr 19, 2018
7fb1117
[SPARK-24021][CORE] fix bug in BlacklistTracker's updateBlacklistForF…
Ngone51 Apr 19, 2018
fb96821
[SPARK-23989][SQL] exchange should copy data before non-serialized sh…
cloud-fan Apr 19, 2018
be184d1
[SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3
dongjoon-hyun Apr 19, 2018
9b562d6
[SPARK-24022][TEST] Make SparkContextSuite not flaky
gaborgsomogyi Apr 19, 2018
8eb64a5
Revert "[SPARK-23775][TEST] Make DataFrameRangeSuite not flaky"
Apr 20, 2018
d914100
[SPARK-24033][SQL] Fix Mismatched of Window Frame specifiedwindowfram…
gatorsmile Apr 21, 2018
c2f4ee7
[SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision b…
Apr 22, 2018
8eb9a41
[SPARK-23004][SS] Ensure StateStore.commit is called only once in a s…
tdas Apr 23, 2018
1c3e820
Revert "[SPARK-23799][SQL] FilterEstimation.evaluateInSet produces de…
gatorsmile Apr 23, 2018
096defd
[MINOR][DOCS] Fix comments of SQLExecution#withExecutionId
seancxmao Apr 24, 2018
07ec75c
[SPARK-24062][THRIFT SERVER] Fix SASL encryption cannot enabled issue…
jerryshao Apr 26, 2018
4a10df0
[SPARK-24085][SQL] Query returns UnsupportedOperationException when s…
dilipbiswal Apr 27, 2018
df45ddb
[SPARK-24104] SQLAppStatusListener overwrites metrics onDriverAccumUp…
juliuszsompolski Apr 27, 2018
235ec9e
[MINOR][DOCS] Fix a broken link for Arrow's supported types in the pr…
HyukjinKwon Apr 30, 2018
52a420f
[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for …
dongjoon-hyun May 1, 2018
682f05d
[SPARK-23941][MESOS] Mesos task failed on specific spark app name
BounkongK May 1, 2018
88abf7b
[SPARK-24107][CORE] ChunkedByteBuffer.writeFully method has not reset…
May 2, 2018
b3adb53
[SPARK-23971][BACKPORT-2.3] Should not leak Spark sessions across tes…
ericl May 2, 2018
0fe53b6
[SPARK-23489][SQL][TEST] HiveExternalCatalogVersionsSuite should veri…
dongjoon-hyun May 3, 2018
10e2f1f
[SPARK-24166][SQL] InMemoryTableScanExec should not access SQLConf at…
cloud-fan May 3, 2018
bfe50b6
[SPARK-24133][SQL] Backport [] Check for integer overflows when resiz…
ala May 3, 2018
61e7bc0
[SPARK-24169][SQL] JsonToStructs should not access SQLConf at executo…
cloud-fan May 3, 2018
8509284
[SPARK-23433][CORE] Late zombie task completions update all tasksets
squito May 3, 2018
d35eb2f
[SPARK-24168][SQL] WindowExec should not access SQLConf at executor side
cloud-fan May 4, 2018
3f78f60
[SPARK-23697][CORE] LegacyAccumulatorWrapper should define isZero cor…
cloud-fan May 4, 2018
f87785a
[SPARK-23775][TEST] Make DataFrameRangeSuite not flaky
gaborgsomogyi May 7, 2018
3a22fea
[SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not reduce starti…
HyukjinKwon May 7, 2018
4dc6719
[SPARK-24128][SQL] Mention configuration option in implicit CROSS JOI…
henryr May 8, 2018
aba52f4
[SPARK-24188][CORE] Restore "/version" API endpoint.
May 8, 2018
8889d78
[SPARK-24214][SS] Fix toJSON for StreamingRelationV2/StreamingExecuti…
zsxwing May 9, 2018
eab10f9
[SPARK-24068][BACKPORT-2.3] Propagating DataFrameReader's options to …
MaxGekk May 10, 2018
323dc3a
[PYSPARK] Update py4j to version 0.10.7.
Apr 13, 2018
16cd9ac
[SPARKR] Match pyspark features in SparkR communication protocol.
Apr 17, 2018
4c49b12
[SPARK-19181][CORE] Fixing flaky "SparkListenerSuite.local metrics"
attilapiros May 10, 2018
414e4e3
[SPARK-10878][CORE] Fix race condition when multiple clients resolves…
kiszk May 10, 2018
1d598b7
[SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] Allow non-consecutive o…
koeninger May 11, 2018
7de4bef
[SPARKR] Require Java 8 for SparkR
shivaram May 12, 2018
867d948
[SPARK-24262][PYTHON] Fix typo in UDF type match error message
robinske May 13, 2018
88003f0
[SPARK-24263][R] SparkR java check breaks with openjdk
felixcheung May 14, 2018
2f60df0
[SPARK-24246][SQL] Improve AnalysisException by setting the cause whe…
zsxwing May 14, 2018
a8ee570
[SPARK-23852][SQL] Upgrade to Parquet 1.8.3
henryr May 14, 2018
6dfb515
[SPARK-23852][SQL] Add withSQLConf(...) to test case
henryr May 14, 2018
cc93bc9
Preparing Spark release v2.3.1-rc1
vanzin May 15, 2018
eb7b373
Preparing development version 2.3.2-SNAPSHOT
vanzin May 15, 2018
a886dc2
[SPARK-23780][R] Failed to use googleVis library with new SparkR
felixcheung May 15, 2018
e16ab6f
[SPARK-24259][SQL] ArrayWriter for Arrow produces wrong output
viirya May 15, 2018
d4a892a
[SPARK-23601][BUILD][FOLLOW-UP] Keep md5 checksums for nexus artifacts.
May 16, 2018
1708de2
[SPARK-24002][SQL][BACKPORT-2.3] Task not serializable caused by org.…
gatorsmile May 17, 2018
28973e1
[SPARK-21945][YARN][PYTHON] Make --py-files work with PySpark shell i…
HyukjinKwon May 17, 2018
895c95e
[SPARK-22371][CORE] Return None instead of throwing an exception when…
artemrd May 17, 2018
d88f3e4
[SPARK-23850][SQL] Add separate config for SQL options redaction.
May 18, 2018
70b8665
[SPARK-24309][CORE] AsyncEventQueue should stop on interrupt.
squito May 21, 2018
93258d8
Preparing Spark release v2.3.1-rc2
vanzin May 22, 2018
efe183f
Preparing development version 2.3.2-SNAPSHOT
vanzin May 22, 2018
ed0060c
Correct reference to Offset class
mojodna May 23, 2018
ded6709
[SPARK-24313][SQL][BACKPORT-2.3] Fix collection operations' interpret…
mgaido91 May 23, 2018
3d2ae0b
[SPARK-24257][SQL] LongToUnsafeRowMap calculate the new size may be w…
cxzl25 May 24, 2018
75e2cd1
[SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
dongjoon-hyun May 24, 2018
068c4ae
[SPARK-24364][SS] Prevent InMemoryFileIndex from failing if file path…
HyukjinKwon May 24, 2018
f48d624
[SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase with dictionar…
rdblue May 24, 2018
d0f30e3
[SPARK-24378][SQL] Fix date_trunc function incorrect examples
wangyum May 24, 2018
54aeae7
[MINOR] Add port SSL config in toString and scaladoc
mgaido91 May 25, 2018
a06fc45
[SPARK-19112][CORE][FOLLOW-UP] Add missing shortCompressionCodecNames…
wangyum May 26, 2018
9b0f6f5
[SPARK-24334] Fix race condition in ArrowPythonRunner causes unclean …
icexelloss May 28, 2018
8bb6c22
[SPARK-24392][PYTHON] Label pandas_udf as Experimental
BryanCutler May 28, 2018
a9700cb
[SPARK-24373][SQL] Add AnalysisBarrier to RelationalGroupedDataset's …
mgaido91 May 28, 2018
fec43fe
[SPARK-19613][SS][TEST] Random.nextString is not safe for directory n…
dongjoon-hyun May 29, 2018
49a6c2b
[SPARK-23991][DSTREAMS] Fix data loss when WAL write fails in allocat…
gaborgsomogyi May 29, 2018
66289a3
[SPARK-24369][SQL] Correct handling for multiple distinct aggregation…
maropu May 30, 2018
e1c0ab1
[SPARK-23754][BRANCH-2.3][PYTHON] Re-raising StopIteration in client …
e-dorigatti May 30, 2018
3a024a4
[SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correctly into Pyth…
HyukjinKwon May 30, 2018
dc24da2
[WEBUI] Avoid possibility of script in query param keys
srowen May 31, 2018
b37e76f
[SPARK-24414][UI] Calculate the correct number of tasks for a stage.
May 31, 2018
e56266a
[SPARK-24444][DOCS][PYTHON][BRANCH-2.3] Improve Pandas UDF docs to ex…
BryanCutler Jun 1, 2018
1cc5f68
Preparing Spark release v2.3.1-rc3
vanzin Jun 1, 2018
2e0c346
Preparing development version 2.3.2-SNAPSHOT
vanzin Jun 1, 2018
e4e96f9
Revert "[SPARK-24369][SQL] Correct handling for multiple distinct agg…
gatorsmile Jun 1, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.6 - http://py4j.sourceforge.net/)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.7 - http://py4j.sourceforge.net/)
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - http://github.com/szeiger/junit-interface/)
(BSD licence) sbt and sbt-launch-lib.bash
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
Expand Down
3 changes: 2 additions & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: SparkR
Type: Package
Version: 2.3.0
Version: 2.3.2
Title: R Frontend for Apache Spark
Description: Provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
Expand All @@ -13,6 +13,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
License: Apache License (== 2.0)
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: http://spark.apache.org/contributing.html
SystemRequirements: Java (== 8)
Depends:
R (>= 3.0),
methods
Expand Down
1 change: 1 addition & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ exportMethods("arrange",
"with",
"withColumn",
"withColumnRenamed",
"withWatermark",
"write.df",
"write.jdbc",
"write.json",
Expand Down
105 changes: 97 additions & 8 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2090,7 +2090,8 @@ setMethod("selectExpr",
#'
#' @param x a SparkDataFrame.
#' @param colName a column name.
#' @param col a Column expression, or an atomic vector in the length of 1 as literal value.
#' @param col a Column expression (which must refer only to this SparkDataFrame), or an atomic
#' vector in the length of 1 as literal value.
#' @return A SparkDataFrame with the new column added or the existing column replaced.
#' @family SparkDataFrame functions
#' @aliases withColumn,SparkDataFrame,character-method
Expand Down Expand Up @@ -2853,7 +2854,7 @@ setMethod("intersect",
#' except
#'
#' Return a new SparkDataFrame containing rows in this SparkDataFrame
#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT} in SQL.
#' but not in another SparkDataFrame. This is equivalent to \code{EXCEPT DISTINCT} in SQL.
#'
#' @param x a SparkDataFrame.
#' @param y a SparkDataFrame.
Expand Down Expand Up @@ -3054,10 +3055,10 @@ setMethod("describe",
#' \item stddev
#' \item min
#' \item max
#' \item arbitrary approximate percentiles specified as a percentage (eg, "75%")
#' \item arbitrary approximate percentiles specified as a percentage (eg, "75\%")
#' }
#' If no statistics are given, this function computes count, mean, stddev, min,
#' approximate quartiles (percentiles at 25%, 50%, and 75%), and max.
#' approximate quartiles (percentiles at 25\%, 50\%, and 75\%), and max.
#' This function is meant for exploratory data analysis, as we make no guarantee about the
#' backward compatibility of the schema of the resulting Dataset. If you want to
#' programmatically compute summary statistics, use the \code{agg} function instead.
Expand Down Expand Up @@ -3661,7 +3662,8 @@ setMethod("getNumPartitions",
#' isStreaming
#'
#' Returns TRUE if this SparkDataFrame contains one or more sources that continuously return data
#' as it arrives.
#' as it arrives. A dataset that reads data from a streaming source must be executed as a
#' \code{StreamingQuery} using \code{write.stream}.
#'
#' @param x A SparkDataFrame
#' @return TRUE if this SparkDataFrame is from a streaming source
Expand Down Expand Up @@ -3707,7 +3709,17 @@ setMethod("isStreaming",
#' @param df a streaming SparkDataFrame.
#' @param source a name for external data source.
#' @param outputMode one of 'append', 'complete', 'update'.
#' @param ... additional argument(s) passed to the method.
#' @param partitionBy a name or a list of names of columns to partition the output by on the file
#' system. If specified, the output is laid out on the file system similar to Hive's
#' partitioning scheme.
#' @param trigger.processingTime a processing time interval as a string, e.g. '5 seconds',
#' '1 minute'. This is a trigger that runs a query periodically based on the processing
#' time. If value is '0 seconds', the query will run as fast as possible, this is the
#' default. Only one trigger can be set.
#' @param trigger.once a logical, must be set to \code{TRUE}. This is a trigger that processes only
#' one batch of data in a streaming query then terminates the query. Only one trigger can be
#' set.
#' @param ... additional external data source specific named options.
#'
#' @family SparkDataFrame functions
#' @seealso \link{read.stream}
Expand All @@ -3725,7 +3737,8 @@ setMethod("isStreaming",
#' # console
#' q <- write.stream(wordCounts, "console", outputMode = "complete")
#' # text stream
#' q <- write.stream(df, "text", path = "/home/user/out", checkpointLocation = "/home/user/cp")
#' q <- write.stream(df, "text", path = "/home/user/out", checkpointLocation = "/home/user/cp"
#' partitionBy = c("year", "month"), trigger.processingTime = "30 seconds")
#' # memory stream
#' q <- write.stream(wordCounts, "memory", queryName = "outs", outputMode = "complete")
#' head(sql("SELECT * from outs"))
Expand All @@ -3737,7 +3750,8 @@ setMethod("isStreaming",
#' @note experimental
setMethod("write.stream",
signature(df = "SparkDataFrame"),
function(df, source = NULL, outputMode = NULL, ...) {
function(df, source = NULL, outputMode = NULL, partitionBy = NULL,
trigger.processingTime = NULL, trigger.once = NULL, ...) {
if (!is.null(source) && !is.character(source)) {
stop("source should be character, NULL or omitted. It is the data source specified ",
"in 'spark.sql.sources.default' configuration by default.")
Expand All @@ -3748,12 +3762,43 @@ setMethod("write.stream",
if (is.null(source)) {
source <- getDefaultSqlSource()
}
cols <- NULL
if (!is.null(partitionBy)) {
if (!all(sapply(partitionBy, function(c) { is.character(c) }))) {
stop("All partitionBy column names should be characters.")
}
cols <- as.list(partitionBy)
}
jtrigger <- NULL
if (!is.null(trigger.processingTime) && !is.na(trigger.processingTime)) {
if (!is.null(trigger.once)) {
stop("Multiple triggers not allowed.")
}
interval <- as.character(trigger.processingTime)
if (nchar(interval) == 0) {
stop("Value for trigger.processingTime must be a non-empty string.")
}
jtrigger <- handledCallJStatic("org.apache.spark.sql.streaming.Trigger",
"ProcessingTime",
interval)
} else if (!is.null(trigger.once) && !is.na(trigger.once)) {
if (!is.logical(trigger.once) || !trigger.once) {
stop("Value for trigger.once must be TRUE.")
}
jtrigger <- callJStatic("org.apache.spark.sql.streaming.Trigger", "Once")
}
options <- varargsToStrEnv(...)
write <- handledCallJMethod(df@sdf, "writeStream")
write <- callJMethod(write, "format", source)
if (!is.null(outputMode)) {
write <- callJMethod(write, "outputMode", outputMode)
}
if (!is.null(cols)) {
write <- callJMethod(write, "partitionBy", cols)
}
if (!is.null(jtrigger)) {
write <- callJMethod(write, "trigger", jtrigger)
}
write <- callJMethod(write, "options", options)
ssq <- handledCallJMethod(write, "start")
streamingQuery(ssq)
Expand Down Expand Up @@ -3967,3 +4012,47 @@ setMethod("broadcast",
sdf <- callJStatic("org.apache.spark.sql.functions", "broadcast", x@sdf)
dataFrame(sdf)
})

#' withWatermark
#'
#' Defines an event time watermark for this streaming SparkDataFrame. A watermark tracks a point in
#' time before which we assume no more late data is going to arrive.
#'
#' Spark will use this watermark for several purposes:
#' \itemize{
#' \item To know when a given time window aggregation can be finalized and thus can be emitted
#' when using output modes that do not allow updates.
#' \item To minimize the amount of state that we need to keep for on-going aggregations.
#' }
#' The current watermark is computed by looking at the \code{MAX(eventTime)} seen across
#' all of the partitions in the query minus a user specified \code{delayThreshold}. Due to the cost
#' of coordinating this value across partitions, the actual watermark used is only guaranteed
#' to be at least \code{delayThreshold} behind the actual event time. In some cases we may still
#' process records that arrive more than \code{delayThreshold} late.
#'
#' @param x a streaming SparkDataFrame
#' @param eventTime a string specifying the name of the Column that contains the event time of the
#' row.
#' @param delayThreshold a string specifying the minimum delay to wait to data to arrive late,
#' relative to the latest record that has been processed in the form of an
#' interval (e.g. "1 minute" or "5 hours"). NOTE: This should not be negative.
#' @return a SparkDataFrame.
#' @aliases withWatermark,SparkDataFrame,character,character-method
#' @family SparkDataFrame functions
#' @rdname withWatermark
#' @name withWatermark
#' @export
#' @examples
#' \dontrun{
#' sparkR.session()
#' schema <- structType(structField("time", "timestamp"), structField("value", "double"))
#' df <- read.stream("json", path = jsonDir, schema = schema, maxFilesPerTrigger = 1)
#' df <- withWatermark(df, "time", "10 minutes")
#' }
#' @note withWatermark since 2.3.0
setMethod("withWatermark",
signature(x = "SparkDataFrame", eventTime = "character", delayThreshold = "character"),
function(x, eventTime, delayThreshold) {
sdf <- callJMethod(x@sdf, "withWatermark", eventTime, delayThreshold)
dataFrame(sdf)
})
4 changes: 3 additions & 1 deletion R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -727,7 +727,9 @@ read.jdbc <- function(url, tableName,
#' @param schema The data schema defined in structType or a DDL-formatted string, this is
#' required for file-based streaming data source
#' @param ... additional external data source specific named options, for instance \code{path} for
#' file-based streaming data source
#' file-based streaming data source. \code{timeZone} to indicate a timezone to be used to
#' parse timestamps in the JSON/CSV data sources or partition values; If it isn't set, it
#' uses the default value, session local timezone.
#' @return SparkDataFrame
#' @rdname read.stream
#' @name read.stream
Expand Down
40 changes: 38 additions & 2 deletions R/pkg/R/client.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

# Creates a SparkR client connection object
# if one doesn't already exist
connectBackend <- function(hostname, port, timeout) {
connectBackend <- function(hostname, port, timeout, authSecret) {
if (exists(".sparkRcon", envir = .sparkREnv)) {
if (isOpen(.sparkREnv[[".sparkRCon"]])) {
cat("SparkRBackend client connection already exists\n")
Expand All @@ -29,7 +29,7 @@ connectBackend <- function(hostname, port, timeout) {

con <- socketConnection(host = hostname, port = port, server = FALSE,
blocking = TRUE, open = "wb", timeout = timeout)

doServerAuth(con, authSecret)
assign(".sparkRCon", con, envir = .sparkREnv)
con
}
Expand Down Expand Up @@ -60,13 +60,49 @@ generateSparkSubmitArgs <- function(args, sparkHome, jars, sparkSubmitOpts, pack
combinedArgs
}

checkJavaVersion <- function() {
javaBin <- "java"
javaHome <- Sys.getenv("JAVA_HOME")
javaReqs <- utils::packageDescription(utils::packageName(), fields = c("SystemRequirements"))
sparkJavaVersion <- as.numeric(tail(strsplit(javaReqs, "[(=)]")[[1]], n = 1L))
if (javaHome != "") {
javaBin <- file.path(javaHome, "bin", javaBin)
}

# If java is missing from PATH, we get an error in Unix and a warning in Windows
javaVersionOut <- tryCatch(
launchScript(javaBin, "-version", wait = TRUE, stdout = TRUE, stderr = TRUE),
error = function(e) {
stop("Java version check failed. Please make sure Java is installed",
" and set JAVA_HOME to point to the installation directory.", e)
},
warning = function(w) {
stop("Java version check failed. Please make sure Java is installed",
" and set JAVA_HOME to point to the installation directory.", w)
})
javaVersionFilter <- Filter(
function(x) {
grepl(" version", x)
}, javaVersionOut)

javaVersionStr <- strsplit(javaVersionFilter[[1]], "[\"]")[[1L]][2]
# javaVersionStr is of the form 1.8.0_92.
# Extract 8 from it to compare to sparkJavaVersion
javaVersionNum <- as.integer(strsplit(javaVersionStr, "[.]")[[1L]][2])
if (javaVersionNum != sparkJavaVersion) {
stop(paste("Java version", sparkJavaVersion, "is required for this package; found version:",
javaVersionStr))
}
}

launchBackend <- function(args, sparkHome, jars, sparkSubmitOpts, packages) {
sparkSubmitBinName <- determineSparkSubmitBin()
if (sparkHome != "") {
sparkSubmitBin <- file.path(sparkHome, "bin", sparkSubmitBinName)
} else {
sparkSubmitBin <- sparkSubmitBinName
}

combinedArgs <- generateSparkSubmitArgs(args, sparkHome, jars, sparkSubmitOpts, packages)
cat("Launching java with spark-submit command", sparkSubmitBin, combinedArgs, "\n")
invisible(launchScript(sparkSubmitBin, combinedArgs))
Expand Down
10 changes: 8 additions & 2 deletions R/pkg/R/column.R
Original file line number Diff line number Diff line change
Expand Up @@ -164,12 +164,18 @@ setMethod("alias",
#' @aliases substr,Column-method
#'
#' @param x a Column.
#' @param start starting position.
#' @param start starting position. It should be 1-base.
#' @param stop ending position.
#' @examples
#' \dontrun{
#' df <- createDataFrame(list(list(a="abcdef")))
#' collect(select(df, substr(df$a, 1, 4))) # the result is `abcd`.
#' collect(select(df, substr(df$a, 2, 4))) # the result is `bcd`.
#' }
#' @note substr since 1.4.0
setMethod("substr", signature(x = "Column"),
function(x, start, stop) {
jc <- callJMethod(x@jc, "substr", as.integer(start - 1), as.integer(stop - start + 1))
jc <- callJMethod(x@jc, "substr", as.integer(start), as.integer(stop - start + 1))
column(jc)
})

Expand Down
10 changes: 7 additions & 3 deletions R/pkg/R/deserialize.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,18 @@ readTypedObject <- function(con, type) {
stop(paste("Unsupported type for deserialization", type)))
}

readString <- function(con) {
stringLen <- readInt(con)
raw <- readBin(con, raw(), stringLen, endian = "big")
readStringData <- function(con, len) {
raw <- readBin(con, raw(), len, endian = "big")
string <- rawToChar(raw)
Encoding(string) <- "UTF-8"
string
}

readString <- function(con) {
stringLen <- readInt(con)
readStringData(con, stringLen)
}

readInt <- function(con) {
readBin(con, integer(), n = 1, endian = "big")
}
Expand Down
Loading