Skip to content

Commit

Permalink
[SPARK-26673][FOLLOWUP][SQL] File Source V2: check existence of outpu…
Browse files Browse the repository at this point in the history
…t path before delete it

## What changes were proposed in this pull request?
This is a followup PR to resolve comment: apache#23601 (review)

When Spark writes DataFrame with "overwrite" mode, it deletes the output path before actual writes. To safely handle the case that the output path doesn't exist,  it is suggested to follow the V1 code by checking the existence.

## How was this patch tested?

Apply apache#23836 and run unit tests

Closes apache#23889 from gengliangwang/checkFileBeforeOverwrite.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
  • Loading branch information
gengliangwang authored and mccheah committed May 15, 2019
1 parent 85d0f08 commit 4db1e19
Showing 1 changed file with 4 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
*/
package org.apache.spark.sql.execution.datasources.v2

import java.io.IOException
import java.util.UUID

import scala.collection.JavaConverters._
Expand Down Expand Up @@ -83,7 +84,9 @@ abstract class FileWriteBuilder(options: DataSourceOptions)
null

case SaveMode.Overwrite =>
committer.deleteWithJob(fs, path, true)
if (fs.exists(path) && !committer.deleteWithJob(fs, path, true)) {
throw new IOException(s"Unable to clear directory $path prior to writing to it")
}
committer.setupJob(job)
new FileBatchWrite(job, description, committer)

Expand Down

0 comments on commit 4db1e19

Please sign in to comment.