[SPARK-15763][SQL] Support DELETE FILE command natively #13506

kevinyu98 · 2016-06-04T00:51:09Z

What changes were proposed in this pull request?

Hive supports these cli commands to manage the resource Hive Doc :
ADD/DELETE (FILE(s)<filepath..>|JAR(s) <jarpath..>)
LIST (FILE(S) [filepath ...] | JAR(S) [jarpath ...])

but SPARK only supports two commands
ADD (FILE <filepath> | JAR <jarpath>)
LIST (FILE(S) [filepath ...] | JAR(S) [jarpath ...]) for now.

This PR is to add the DELETE FILE command into Spark SQL and I will submit another PR for the DELETE JAR(s).

DELETE FILE <filepath>

Example:

DELETE FILE

scala> spark.sql("add file /Users/qianyangyu/myfile.txt")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("add file /Users/qianyangyu/myfile2.txt")
res1: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file")
res2: org.apache.spark.sql.DataFrame = [Results: string]

scala> spark.sql("list file").show(false)
+----------------------------------+
|Results                           |
+----------------------------------+
|file:/Users/qianyangyu/myfile2.txt|
|file:/Users/qianyangyu/myfile.txt |
+----------------------------------+
scala> spark.sql("delete file /Users/qianyangyu/myfile.txt")
res4: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file").show(false)
+----------------------------------+
|Results                           |
+----------------------------------+
|file:/Users/qianyangyu/myfile2.txt|
+----------------------------------+


scala> spark.sql("delete file /Users/qianyangyu/myfile2.txt")
res6: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file").show(false)
+-------+
|Results|
+-------+
+-------+

How was this patch tested?

Add test cases in Spark-SQL SPARK-Shell and SparkContext suites.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

get latest code from upstream

adding trim characters support

get latest code for pr12646

merge latest code

merge upstream/master

AmplabJenkins · 2016-06-04T00:52:12Z

Can one of the admins verify this patch?

rxin · 2016-06-04T05:14:59Z

core/src/main/scala/org/apache/spark/SparkContext.scala

+   * use `SparkFiles.get(fileName)` to find its download location.
+   *
+   */
+  def deleteFile(path: String): Unit = {


this is fairly confusing -- i'd assume this is actually deleting the path given.

Hi Reynold: Thanks very much for reviewing the code.
yes, it is deleting the path from the addedFile hashmap, the path will be generated as key and stored in the map.
The addFile use this logical to generate the key and stored in the hashmap, so in order to find the same key, I have to use the same logical to generate the key.
For example:
for this local file, the addFile will generate a 'file' in front of the path.

spark.sql("add file /Users/qianyangyu/myfile.txt")

scala> spark.sql("list file").show(false)
+----------------------------------+
|Results |
+----------------------------------+
|file:/Users/qianyangyu/myfile2.txt|
|file:/Users/qianyangyu/myfile.txt |
+----------------------------------+

but for the remote location file, it will just take the path.

scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt")
res17: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file").show(false)
+---------------------------------------------+
|Results |
+---------------------------------------------+
|file:/Users/qianyangyu/myfile.txt |
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt|
+---------------------------------------------+

if the command is issued from the worker node and add local file, the path will be added into the NettyStreamManager's hashmap and using that environment's path as key to store in the addedFiles.

I have updated the deleteFile comments to make it more clear. Thanks for reviewing.

vanzin · 2016-08-04T18:32:03Z

@kevinyu98 Could you update the PR and fix merge conflicts? Thanks

kevinyu98 · 2016-08-19T23:05:50Z

@vanzin Hello Marcelo: I am so sorry that I didn't notice your update. I have fix the merge conflicts and can you help review it? Thanks.

gatorsmile · 2017-06-16T23:23:52Z

@kevinyu98 Can you please close it? It seems like there is not a lot of interest in adding this functionality natively in Spark. If anybody wants this feature, we can reopen it later?

kevinyu98 · 2017-06-26T00:28:51Z

sure

gatorsmile · 2017-06-27T06:41:25Z

We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!

kevinyu98 added 26 commits April 20, 2016 11:06

adding testcase

3b44c59

Merge remote-tracking branch 'upstream/master'

18b4a31

Merge remote-tracking branch 'upstream/master'

4f4d1c8

get latest code from upstream

Merge remote-tracking branch 'upstream/master'

f5f0cbe

adding trim characters support

Merge remote-tracking branch 'upstream/master'

d8b2edb

get latest code for pr12646

Merge remote-tracking branch 'upstream/master'

196b6c6

merge latest code

Merge remote-tracking branch 'upstream/master'

f37a01e

merge upstream/master

Merge remote-tracking branch 'upstream/master'

bb5b01f

Merge remote-tracking branch 'upstream/master'

bde5820

Merge remote-tracking branch 'upstream/master'

5f7cd96

Merge remote-tracking branch 'upstream/master'

893a49a

Merge remote-tracking branch 'upstream/master'

4bbe1fd

Merge remote-tracking branch 'upstream/master'

b2dd795

Merge remote-tracking branch 'upstream/master'

8c3e5da

Merge remote-tracking branch 'upstream/master'

a0eaa40

Merge remote-tracking branch 'upstream/master'

d03c940

Merge remote-tracking branch 'upstream/master'

d728d5e

Merge remote-tracking branch 'upstream/master'

ea104dd

Merge remote-tracking branch 'upstream/master'

6ab1215

Merge remote-tracking branch 'upstream/master'

0c56653

Merge remote-tracking branch 'upstream/master'

d7a1874

Merge remote-tracking branch 'upstream/master'

85d3500

Merge remote-tracking branch 'upstream/master'

c056f91

fix7

6dd6ca9

Merge remote-tracking branch 'upstream/master'

0b8189d

Merge branch 'spark-deletefile' into spark-15763

527749d

rxin reviewed Jun 4, 2016
View reviewed changes

kevinyu98 added 2 commits June 6, 2016 14:04

Merge remote-tracking branch 'upstream/master'

c2ea31d

fix comments for deleteFile

8767570

kevinyu98 added 4 commits June 7, 2016 19:52

Merge remote-tracking branch 'upstream/master'

a2d3056

Merge remote-tracking branch 'upstream/master'

39e5648

Merge remote-tracking branch 'upstream/master'

b9370a3

Merge remote-tracking branch 'upstream/master'

01224a4

kevinyu98 added 2 commits August 19, 2016 14:55

Merge remote-tracking branch 'upstream/master'

d05d39a

Merge branch 'spark-15763' into spark-15763new

d38a412

HyukjinKwon mentioned this pull request Jun 25, 2017

[INFRA] Close stale PRs #18417

Closed

asfgit closed this in b32bd00 Jun 27, 2017

yaooqinn mentioned this pull request Dec 6, 2019

[SPARK-30134][SQL] Support DELETE JAR feature in SPARK #26777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15763][SQL] Support DELETE FILE command natively #13506

[SPARK-15763][SQL] Support DELETE FILE command natively #13506

kevinyu98 commented Jun 4, 2016

AmplabJenkins commented Jun 4, 2016

rxin Jun 4, 2016

kevinyu98 Jun 4, 2016

kevinyu98 Jun 6, 2016

vanzin commented Aug 4, 2016

kevinyu98 commented Aug 19, 2016

gatorsmile commented Jun 16, 2017

kevinyu98 commented Jun 26, 2017

gatorsmile commented Jun 27, 2017

[SPARK-15763][SQL] Support DELETE FILE command natively #13506

[SPARK-15763][SQL] Support DELETE FILE command natively #13506

Conversation

kevinyu98 commented Jun 4, 2016

What changes were proposed in this pull request?

Example:

How was this patch tested?

AmplabJenkins commented Jun 4, 2016

rxin Jun 4, 2016

Choose a reason for hiding this comment

kevinyu98 Jun 4, 2016

Choose a reason for hiding this comment

kevinyu98 Jun 6, 2016

Choose a reason for hiding this comment

vanzin commented Aug 4, 2016

kevinyu98 commented Aug 19, 2016

gatorsmile commented Jun 16, 2017

kevinyu98 commented Jun 26, 2017

gatorsmile commented Jun 27, 2017