Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5821] [SQL] JSON CTAS command should throw error message when delete path failure #4610

Closed
wants to merge 7 commits into from

Conversation

yanboliang
Copy link
Contributor

When using "CREATE TEMPORARY TABLE AS SELECT" to create JSON table, we first delete the path file or directory and then generate a new directory with the same name. But if only read permission was granted, the delete failed.
Here we just throwing an error message to let users know what happened.
ParquetRelation2 may also hit this problem. I think to restrict JSONRelation and ParquetRelation2 must base on directory is more reasonable for access control. Maybe I can do it in follow up works.

@yanboliang yanboliang changed the title JSON external data source INSERT improvements initial draft [SPARK-5821] [SPARK-5746] [SQL] JSON external data source INSERT improvements initial draft Feb 15, 2015
@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27506 has started for PR 4610 at commit 307278f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27506 has finished for PR 4610 at commit 307278f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27506/
Test FAILed.

@yanboliang yanboliang changed the title [SPARK-5821] [SPARK-5746] [SQL] JSON external data source INSERT improvements initial draft [SPARK-5821] [SQL] JSON external data source INSERT improvements initial draft Feb 15, 2015
@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27507 has started for PR 4610 at commit 3156c1b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27507 has finished for PR 4610 at commit 3156c1b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27507/
Test FAILed.

// Only save data when the save mode is not ignore.
data.toJSON.saveAsTextFile(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just save the data?

@yhuai
Copy link
Contributor

yhuai commented Feb 15, 2015

@yanbohappy It will be great if we use this PR to only do the task of throwing a nice error message when the delete returns false and data already exists in the code path for overwrite.

@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27509 has started for PR 4610 at commit 3cdfb7a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27509 has finished for PR 4610 at commit 3cdfb7a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27509/
Test FAILed.

@yanboliang yanboliang changed the title [SPARK-5821] [SQL] JSON external data source INSERT improvements initial draft [SPARK-5821] [SQL] JSON external data source insert error due to with permission Feb 15, 2015
@yanboliang yanboliang changed the title [SPARK-5821] [SQL] JSON external data source insert error due to with permission [SPARK-5821] [SQL] JSON external data source insert error due to without permission Feb 15, 2015
@yanboliang
Copy link
Contributor Author

@yhuai
OK, if we don't support write to a table while reading it, the code will be more concise.
I will try to rewrite the code to focus on the issue SPARK-5812.
Thank you for your comments.

@yanboliang yanboliang force-pushed the jsonInsertImprovements branch from 3cdfb7a to 5a42d83 Compare February 16, 2015 10:00
@SparkQA
Copy link

SparkQA commented Feb 16, 2015

Test build #27551 has started for PR 4610 at commit 5a42d83.

  • This patch merges cleanly.

@yanboliang yanboliang changed the title [SPARK-5821] [SQL] JSON external data source insert error due to without permission [SPARK-5821] [SQL] JSON CTAS command should throw error message when delete path failure Feb 16, 2015
@SparkQA
Copy link

SparkQA commented Feb 16, 2015

Test build #27551 has finished for PR 4610 at commit 5a42d83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27551/
Test PASSed.

throw new IOException(
s"Unable to clear output directory ${filesystemPath.toString} prior"
+ s" to CREATE a JSON table AS SELECT:\n${e.toString}")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanbohappy Seems we just throw another error message at here. Based on your JIRA description, I think you need to check if delete returns true or false when data already exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai Yes, it is expected. If there is no exception thrown, it can ensure delete success. I have investigated other codes in spark and they do the same thing for delete.

@marmbrus
Copy link
Contributor

@yhuai what is the status here?

@yhuai
Copy link
Contributor

yhuai commented Mar 14, 2015

Sorry for not following it. I will investigate it and reproduce the error reported in jira. I should have more ideas on how we want to have the change early next week.

@SparkQA
Copy link

SparkQA commented Mar 16, 2015

Test build #28643 has started for PR 4610 at commit e4bc229.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28784 has finished for PR 4610 at commit 46f0d9d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28784/
Test PASSed.

if (!fs.delete(filesystemPath, true)) {
throw new IOException(
s"Unable to clear output directory ${filesystemPath.toString} prior"
+ s" to INSERT OVERWRITE a JSON table:\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say "prior to writing to JSON file" since a user can reach this code path through DataFrame.save, which is not related to table operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, get rid of :\n and add a . at the end of the message.

@yhuai
Copy link
Contributor

yhuai commented Mar 19, 2015

Thank you for the update! It's very close. I think we want to do the same thing for ParquetRelation2. Is it possible to add a test for this?

cc @liancheng

@SparkQA
Copy link

SparkQA commented Mar 19, 2015

Test build #28860 has started for PR 4610 at commit c387fce.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 19, 2015

Test build #28860 has finished for PR 4610 at commit c387fce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28860/
Test PASSed.

@liancheng
Copy link
Contributor

@yhuai Thanks for cc-ing me this. Although ParquetRelation2 checks for exception, it doesn't check return value of fs.delete, and thus suffer the same issue. Opening a separate PR for this.

@yanboliang Thanks for pointing out the bug and working on this!

@yanboliang
Copy link
Contributor Author

@yhuai Thank you for your comments on this PR
@liancheng If this LGTM, can I open another PR for ParquetRelation2 and work on it?

@liancheng
Copy link
Contributor

@yanboliang Sure, please go ahead. I'm trying to verify this issue locally.

@liancheng
Copy link
Contributor

@yanboliang Adding a check for return value at this line should be enough.

@yanboliang
Copy link
Contributor Author

@liancheng Please review my fix for ParquetRelation2 in #5107

asfgit pushed a commit that referenced this pull request Mar 21, 2015
…ccessful

Do the same check as #4610 for ParquetRelation2.

Author: Yanbo Liang <[email protected]>

Closes #5107 from yanboliang/spark-5821-parquet and squashes the following commits:

7092c8d [Yanbo Liang] ParquetRelation2 CTAS should check if delete is successful
asfgit pushed a commit that referenced this pull request Mar 21, 2015
…ccessful

Do the same check as #4610 for ParquetRelation2.

Author: Yanbo Liang <[email protected]>

Closes #5107 from yanboliang/spark-5821-parquet and squashes the following commits:

7092c8d [Yanbo Liang] ParquetRelation2 CTAS should check if delete is successful

(cherry picked from commit bc37c97)
Signed-off-by: Cheng Lian <[email protected]>
asfgit pushed a commit that referenced this pull request Mar 21, 2015
…delete path failure

When using "CREATE TEMPORARY TABLE AS SELECT" to create JSON table, we first delete the path file or directory and then generate a new directory with the same name. But if only read permission was granted, the delete failed.
Here we just throwing an error message to let users know what happened.
ParquetRelation2 may also hit this problem. I think to restrict JSONRelation and ParquetRelation2 must base on directory is more reasonable for access control. Maybe I can do it in follow up works.

Author: Yanbo Liang <[email protected]>
Author: Yanbo Liang <[email protected]>

Closes #4610 from yanboliang/jsonInsertImprovements and squashes the following commits:

c387fce [Yanbo Liang] fix typos
42d7fb6 [Yanbo Liang] add unittest & fix output format
46f0d9d [Yanbo Liang] Update JSONRelation.scala
e2df8d5 [Yanbo Liang] check path exisit when write
79f7040 [Yanbo Liang] Update JSONRelation.scala
e4bc229 [Yanbo Liang] Update JSONRelation.scala
5a42d83 [Yanbo Liang] JSONRelation CTAS should check if delete is successful

(cherry picked from commit e5d2c37)
Signed-off-by: Cheng Lian <[email protected]>
@asfgit asfgit closed this in e5d2c37 Mar 21, 2015
@liancheng
Copy link
Contributor

Thanks, merged into master and branch-1.3.

@liancheng
Copy link
Contributor

PS: #5107 has also been merged.

@yanboliang yanboliang deleted the jsonInsertImprovements branch April 24, 2015 10:02
LuciferYang pushed a commit that referenced this pull request Jul 9, 2024
### What changes were proposed in this pull request?

This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2.

### Why are the changes needed?

There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind):
[#4561](FasterXML/jackson-databind#4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock)
[#4575](FasterXML/jackson-databind#4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input
[#4577](FasterXML/jackson-databind#4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation)
[#4595](FasterXML/jackson-databind#4595): No way to explicitly disable wrapping in custom annotation processor
[#4607](FasterXML/jackson-databind#4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id'
[#4610](FasterXML/jackson-databind#4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling

The full release note of 2.17.2:
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47241 from wayneguow/upgrade_jackson.

Authored-by: Wei Guo <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Jul 10, 2024
### What changes were proposed in this pull request?

This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2.

### Why are the changes needed?

There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind):
[apache#4561](FasterXML/jackson-databind#4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock)
[apache#4575](FasterXML/jackson-databind#4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input
[apache#4577](FasterXML/jackson-databind#4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation)
[apache#4595](FasterXML/jackson-databind#4595): No way to explicitly disable wrapping in custom annotation processor
[apache#4607](FasterXML/jackson-databind#4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id'
[apache#4610](FasterXML/jackson-databind#4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling

The full release note of 2.17.2:
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47241 from wayneguow/upgrade_jackson.

Authored-by: Wei Guo <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?

This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2.

### Why are the changes needed?

There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind):
[apache#4561](FasterXML/jackson-databind#4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock)
[apache#4575](FasterXML/jackson-databind#4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input
[apache#4577](FasterXML/jackson-databind#4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation)
[apache#4595](FasterXML/jackson-databind#4595): No way to explicitly disable wrapping in custom annotation processor
[apache#4607](FasterXML/jackson-databind#4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id'
[apache#4610](FasterXML/jackson-databind#4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling

The full release note of 2.17.2:
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47241 from wayneguow/upgrade_jackson.

Authored-by: Wei Guo <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2.

### Why are the changes needed?

There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind):
[apache#4561](FasterXML/jackson-databind#4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock)
[apache#4575](FasterXML/jackson-databind#4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input
[apache#4577](FasterXML/jackson-databind#4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation)
[apache#4595](FasterXML/jackson-databind#4595): No way to explicitly disable wrapping in custom annotation processor
[apache#4607](FasterXML/jackson-databind#4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id'
[apache#4610](FasterXML/jackson-databind#4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling

The full release note of 2.17.2:
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47241 from wayneguow/upgrade_jackson.

Authored-by: Wei Guo <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants