-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose SparkListeners and relevant classes as DeveloperApi #648
Conversation
* Identifies a particular Block of data, usually associated with a single file. | ||
* A Block can be uniquely identified by its filename, but each type of Block has a different | ||
* set of keys which produce its unique name. | ||
* | ||
* If your BlockId should be serializable, be sure to add it to the BlockId.apply() method. | ||
*/ | ||
private[spark] sealed abstract class BlockId { | ||
@DeveloperApi | ||
sealed abstract class BlockId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just expose BlockId's as strings instead of revealing this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean exposing name
and toString
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just mean we should pass these around as String's in the events. Would that be a big change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to give users some more semantics about this... it might be good to leave it as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a fairly big change, since we technically don't need the BlockId classes anymore, as the strings are supposedly globally unique. I'm inclined to just leave it.
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Thanks, I merged this. |
Hopefully this can go into 1.0, as a few people on the user list have asked for this. Author: Andrew Or <[email protected]> Closes #648 from andrewor14/expose-listeners and squashes the following commits: e45e1ef [Andrew Or] Add missing colons (minor) 350d643 [Andrew Or] Expose SparkListeners and relevant classes as DeveloperApi (cherry picked from commit ea10b31) Signed-off-by: Patrick Wendell <[email protected]>
Hopefully this can go into 1.0, as a few people on the user list have asked for this. Author: Andrew Or <[email protected]> Closes apache#648 from andrewor14/expose-listeners and squashes the following commits: e45e1ef [Andrew Or] Add missing colons (minor) 350d643 [Andrew Or] Expose SparkListeners and relevant classes as DeveloperApi
These listeners weren't really meant for external consumption, but they're public and marked with @DeveloperAPI. Adding the deprecated tag warns people that they may soon go away (as they will as part of the work for SPARK-18085. Note that not all types made public by apache#648 are being deprecated. Some remaining types are still exposed through the SparkListener API. Also note the text for StorageStatus is a tiny bit different, since I'm not so sure I'll be able to remove it. But the effect for the users should be the same (they should stop trying to use it).
These listeners weren't really meant for external consumption, but they're public and marked with DeveloperApi. Adding the deprecated tag warns people that they may soon go away (as they will as part of the work for SPARK-18085). Note that not all types made public by #648 are being deprecated. Some remaining types are still exposed through the SparkListener API. Also note the text for StorageStatus is a tiny bit different, since I'm not so sure I'll be able to remove it. But the effect for the users should be the same (they should stop trying to use it). Author: Marcelo Vanzin <[email protected]> Closes #17766 from vanzin/SPARK-20421. (cherry picked from commit 561e9cc) Signed-off-by: Marcelo Vanzin <[email protected]>
These listeners weren't really meant for external consumption, but they're public and marked with DeveloperApi. Adding the deprecated tag warns people that they may soon go away (as they will as part of the work for SPARK-18085). Note that not all types made public by apache#648 are being deprecated. Some remaining types are still exposed through the SparkListener API. Also note the text for StorageStatus is a tiny bit different, since I'm not so sure I'll be able to remove it. But the effect for the users should be the same (they should stop trying to use it). Author: Marcelo Vanzin <[email protected]> Closes apache#17766 from vanzin/SPARK-20421.
…Utils.unpack (apache#643) (apache#648) ### What changes were proposed in this pull request? This PR proposes to use `FileUtil.unTarUsingJava` that is a Java implementation for un-tar `.tar` files. `unTarUsingJava` is not public but it exists in all Hadoop versions from 2.1+, see HADOOP-9264. The security issue reproduction requires a non-Windows platform, and a non-gzipped TAR archive file name (contents don't matter). ### Why are the changes needed? There is a risk for arbitrary shell command injection via `Utils.unpack` when the filename is controlled by a malicious user. This is due to an issue in Hadoop's `unTar`, that is not properly escaping the filename before passing to a shell command:https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java#L904 ### Does this PR introduce _any_ user-facing change? Yes, it prevents a security issue that, previously, allowed users to execute arbitrary shall command. ### How was this patch tested? Manually tested in local, and existing test cases should cover. Closes apache#35946 from HyukjinKwon/SPARK-38631. Authored-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 057c051) Signed-off-by: Hyukjin Kwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]>
Hopefully this can go into 1.0, as a few people on the user list have asked for this.