-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5821] [SQL] JSON CTAS command should throw error message when delete path failure #4610
Conversation
Test build #27506 has started for PR 4610 at commit
|
Test build #27506 has finished for PR 4610 at commit
|
Test FAILed. |
Test build #27507 has started for PR 4610 at commit
|
Test build #27507 has finished for PR 4610 at commit
|
Test FAILed. |
// Only save data when the save mode is not ignore. | ||
data.toJSON.saveAsTextFile(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just save the data?
@yanbohappy It will be great if we use this PR to only do the task of throwing a nice error message when the |
Test build #27509 has started for PR 4610 at commit
|
Test build #27509 has finished for PR 4610 at commit
|
Test FAILed. |
@yhuai |
3cdfb7a
to
5a42d83
Compare
Test build #27551 has started for PR 4610 at commit
|
Test build #27551 has finished for PR 4610 at commit
|
Test PASSed. |
throw new IOException( | ||
s"Unable to clear output directory ${filesystemPath.toString} prior" | ||
+ s" to CREATE a JSON table AS SELECT:\n${e.toString}") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanbohappy Seems we just throw another error message at here. Based on your JIRA description, I think you need to check if delete returns true or false when data already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai Yes, it is expected. If there is no exception thrown, it can ensure delete success. I have investigated other codes in spark and they do the same thing for delete.
@yhuai what is the status here? |
Sorry for not following it. I will investigate it and reproduce the error reported in jira. I should have more ideas on how we want to have the change early next week. |
Test build #28643 has started for PR 4610 at commit
|
Test build #28784 has finished for PR 4610 at commit
|
Test PASSed. |
if (!fs.delete(filesystemPath, true)) { | ||
throw new IOException( | ||
s"Unable to clear output directory ${filesystemPath.toString} prior" | ||
+ s" to INSERT OVERWRITE a JSON table:\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we say "prior to writing to JSON file" since a user can reach this code path through DataFrame.save
, which is not related to table operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, get rid of :\n
and add a .
at the end of the message.
Thank you for the update! It's very close. I think we want to do the same thing for cc @liancheng |
Test build #28860 has started for PR 4610 at commit
|
Test build #28860 has finished for PR 4610 at commit
|
Test PASSed. |
@yhuai Thanks for cc-ing me this. Although @yanboliang Thanks for pointing out the bug and working on this! |
@yhuai Thank you for your comments on this PR |
@yanboliang Sure, please go ahead. I'm trying to verify this issue locally. |
@yanboliang Adding a check for return value at this line should be enough. |
@liancheng Please review my fix for ParquetRelation2 in #5107 |
…ccessful Do the same check as #4610 for ParquetRelation2. Author: Yanbo Liang <[email protected]> Closes #5107 from yanboliang/spark-5821-parquet and squashes the following commits: 7092c8d [Yanbo Liang] ParquetRelation2 CTAS should check if delete is successful
…ccessful Do the same check as #4610 for ParquetRelation2. Author: Yanbo Liang <[email protected]> Closes #5107 from yanboliang/spark-5821-parquet and squashes the following commits: 7092c8d [Yanbo Liang] ParquetRelation2 CTAS should check if delete is successful (cherry picked from commit bc37c97) Signed-off-by: Cheng Lian <[email protected]>
…delete path failure When using "CREATE TEMPORARY TABLE AS SELECT" to create JSON table, we first delete the path file or directory and then generate a new directory with the same name. But if only read permission was granted, the delete failed. Here we just throwing an error message to let users know what happened. ParquetRelation2 may also hit this problem. I think to restrict JSONRelation and ParquetRelation2 must base on directory is more reasonable for access control. Maybe I can do it in follow up works. Author: Yanbo Liang <[email protected]> Author: Yanbo Liang <[email protected]> Closes #4610 from yanboliang/jsonInsertImprovements and squashes the following commits: c387fce [Yanbo Liang] fix typos 42d7fb6 [Yanbo Liang] add unittest & fix output format 46f0d9d [Yanbo Liang] Update JSONRelation.scala e2df8d5 [Yanbo Liang] check path exisit when write 79f7040 [Yanbo Liang] Update JSONRelation.scala e4bc229 [Yanbo Liang] Update JSONRelation.scala 5a42d83 [Yanbo Liang] JSONRelation CTAS should check if delete is successful (cherry picked from commit e5d2c37) Signed-off-by: Cheng Lian <[email protected]>
Thanks, merged into master and branch-1.3. |
PS: #5107 has also been merged. |
### What changes were proposed in this pull request? This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2. ### Why are the changes needed? There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind): [#4561](FasterXML/jackson-databind#4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock) [#4575](FasterXML/jackson-databind#4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input [#4577](FasterXML/jackson-databind#4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation) [#4595](FasterXML/jackson-databind#4595): No way to explicitly disable wrapping in custom annotation processor [#4607](FasterXML/jackson-databind#4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id' [#4610](FasterXML/jackson-databind#4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling The full release note of 2.17.2: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47241 from wayneguow/upgrade_jackson. Authored-by: Wei Guo <[email protected]> Signed-off-by: yangjie01 <[email protected]>
### What changes were proposed in this pull request? This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2. ### Why are the changes needed? There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind): [apache#4561](FasterXML/jackson-databind#4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock) [apache#4575](FasterXML/jackson-databind#4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input [apache#4577](FasterXML/jackson-databind#4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation) [apache#4595](FasterXML/jackson-databind#4595): No way to explicitly disable wrapping in custom annotation processor [apache#4607](FasterXML/jackson-databind#4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id' [apache#4610](FasterXML/jackson-databind#4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling The full release note of 2.17.2: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47241 from wayneguow/upgrade_jackson. Authored-by: Wei Guo <[email protected]> Signed-off-by: yangjie01 <[email protected]>
### What changes were proposed in this pull request? This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2. ### Why are the changes needed? There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind): [apache#4561](FasterXML/jackson-databind#4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock) [apache#4575](FasterXML/jackson-databind#4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input [apache#4577](FasterXML/jackson-databind#4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation) [apache#4595](FasterXML/jackson-databind#4595): No way to explicitly disable wrapping in custom annotation processor [apache#4607](FasterXML/jackson-databind#4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id' [apache#4610](FasterXML/jackson-databind#4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling The full release note of 2.17.2: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47241 from wayneguow/upgrade_jackson. Authored-by: Wei Guo <[email protected]> Signed-off-by: yangjie01 <[email protected]>
### What changes were proposed in this pull request? This PR amis to upgrade `fasterxml.jackson` from 2.17.1 to 2.17.2. ### Why are the changes needed? There are some bug fixes about [Databind](https://github.com/FasterXML/jackson-databind): [apache#4561](FasterXML/jackson-databind#4561): Issues using jackson-databind 2.17.1 with Reactor (wrt DeserializerCache and ReentrantLock) [apache#4575](FasterXML/jackson-databind#4575): StdDelegatingSerializer does not consider a Converter that may return null for a non-null input [apache#4577](FasterXML/jackson-databind#4577): Cannot deserialize value of type java.math.BigDecimal from String "3." (not a valid representation) [apache#4595](FasterXML/jackson-databind#4595): No way to explicitly disable wrapping in custom annotation processor [apache#4607](FasterXML/jackson-databind#4607): MismatchedInput: No Object Id found for an instance of X to assign to property 'id' [apache#4610](FasterXML/jackson-databind#4610): DeserializationFeature.FAIL_ON_UNRESOLVED_OBJECT_IDS does not work when used with Polymorphic type handling The full release note of 2.17.2: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47241 from wayneguow/upgrade_jackson. Authored-by: Wei Guo <[email protected]> Signed-off-by: yangjie01 <[email protected]>
When using "CREATE TEMPORARY TABLE AS SELECT" to create JSON table, we first delete the path file or directory and then generate a new directory with the same name. But if only read permission was granted, the delete failed.
Here we just throwing an error message to let users know what happened.
ParquetRelation2 may also hit this problem. I think to restrict JSONRelation and ParquetRelation2 must base on directory is more reasonable for access control. Maybe I can do it in follow up works.