-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-3794 [CORE] Building spark core fails due to inadvertent dependency on Commons IO #2662
Conversation
…version, which doesn't need to traverse the whole directory tree first
QA tests have started for PR 2662 at commit
|
QA tests have finished for PR 2662 at commit
|
Test PASSed. |
val cutoffTimeInMillis = (currentTimeMillis - (cutoff * 1000)) | ||
val newFiles = files.filter { _.lastModified > cutoffTimeInMillis } | ||
newFiles.nonEmpty | ||
throw new IllegalArgumentException("$dir is not a directory!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not string interpolated (missing s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, sorry, look at what happens when I 'improve' a line of code. Anyone feel free to zap it or I'm about to open a related PR anyway that can fix it.
Sorry about that. I think Jenkins should be catching these kinds of build failures though. Jenkins should attempt to build the project against multiple versions of hadoop since contributors may be used to using things like FileUtils and other libraries that may have incompatibility issues. I've opened https://issues.apache.org/jira/browse/SPARK-3819 to consider updating the Jenkins build process. Feel free to discuss if such measures are necessary there. |
@mccheah agree about jenkins catching these, but at the same time is sort of sketchy to rely on transitive dependencies of Hadoop exactly for that reason. commons-io is not an explicit dependency of Spark, so it should be avoided. |
Fair enough. I guess I didn't actually check up the explicit dependencies of Spark before I chose the library to use, so when it just magically appeared in autocomplete in Eclipse I just assumed it would be okay to use. Certainly it was my fault. The bottom line is that we could be more explicit about this. Catching it in the build would certainly be explicit. Perhaps also something in the documentation? |
Remove references to Commons IO FileUtils and replace with pure Java version, which doesn't need to traverse the whole directory tree first.
I think this method could be refined further if it would be alright to rename it and its args and break it down into two methods. I'm starting with a simple recursive rendition.