-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] SPARK-1699: Python relative independence from the core, becomes subprojects #624
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The change adds the `./yarn/stable/target/<scala-version>/classes` to the _Classpath_ when a _dependencies_ assembly is available at the assembly directory. Why is this change necessary? Ease the development features and bug-fixes for Spark-YARN. [ticket: X] : NA Author : [email protected] Reviewer : ? Testing : ?
…ectory. Why is this change necessary? While developing in Spark I found myself rebuilding either the dependencies assembly or the full spark assembly. I kept running into the case of having both the dep-assembly and full-assembly in the same directory and getting an error when I called either `spark-shell` or `spark-submit`. Quick fix: move either of them as a .bkp file depending on the development work flow you are executing at the moment and enabling the `spark-class` to ignore non-jar files. An other option could be to move the "offending" jar to a different directory but in my opinion keeping them in there is a bit tidier. e.g. ``` ll ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp ``` [ticket: X] : ?
…UNCH_COMMAND . Why is this change necessary? Most likely when enabling the `--log-conf` through the `spark-shell` you are also interested on the full invocation of the java command including the _classpath_ and extended options. e.g. ``` INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark INFO: Spark Master is yarn-client INFO: Spark REPL options -Dspark.logConf=true Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp :/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop -XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log -XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof -XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails -XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution -XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation -Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC -Dspark.cleaner.ttl=10000 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true -Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main ``` [ticket: X] : ?
Why is this change necessary? Renamed the SBT "root" project to "spark" to enhance readability. Currently the assembly is qualified with the Hadoop Version but not if YARN has been enabled or not. This change qualifies the assembly such that it is easy to identify if YARN was enabled. e.g ``` ./make-distribution.sh --hadoop 2.3.0 --with-yarn ls -l ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-yarn.jar ``` vs ``` ./make-distribution.sh --hadoop 2.3.0 ls -l ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar ``` [ticket: X] : ?
Upgraded to YARN 2.3.0, removed unnecessary `relativePath` values and removed incorrect version for the "org.apache.hadoop:hadoop-client" dependency at yarn/pom.xml.
…n spark built for hadoop 2.3.0 , 2.4.0
…ad to throw an SecurityException when Spark built for hadoop 2.3.0 , 2.4.0
Can one of the admins verify this patch? |
witgo
changed the title
SPARK-1699: Python relative independence from the core, becomes subprojects
[WIP]SPARK-1699: Python relative independence from the core, becomes subprojects
May 3, 2014
witgo
changed the title
[WIP]SPARK-1699: Python relative independence from the core, becomes subprojects
[WIP] SPARK-1699: Python relative independence from the core, becomes subprojects
May 3, 2014
Branch is wrong, temporarily closed. |
gzm55
pushed a commit
to MediaV/spark
that referenced
this pull request
Jul 17, 2014
The original poster of this bug is @guojc, who opened a PR that preceded this one at https://github.com/apache/incubator-spark/pull/612. ExternalAppendOnlyMap uses key hash code to order the buffer streams from which spilled files are read back into memory. When a buffer stream is empty, the default hash code for that stream is equal to Int.MaxValue. This is, however, a perfectly legitimate candidate for a key hash code. When reading from a spilled map containing such a key, a hash collision may occur, in which case we attempt to read from an empty stream and throw NoSuchElementException. The fix is to maintain the invariant that empty buffer streams are never added back to the merge queue to be considered. This guarantees that we never read from an empty buffer stream, ever again. This PR also includes two new tests for hash collisions. Author: Andrew Or <[email protected]> Closes apache#624 from andrewor14/spilling-bug and squashes the following commits: 9e7263d [Andrew Or] Slightly optimize next() 2037ae2 [Andrew Or] Move a few comments around... cf95942 [Andrew Or] Remove default value of Int.MaxValue for minKeyHash c11f03b [Andrew Or] Fix Int.MaxValue hash collision bug in ExternalAppendOnlyMap 21c1a39 [Andrew Or] Add hash collision tests to ExternalAppendOnlyMapSuite (cherry picked from commit fefd22f) Signed-off-by: Patrick Wendell <[email protected]>
andrewor14
added a commit
to andrewor14/spark
that referenced
this pull request
Jan 8, 2015
… bug The original poster of this bug is @guojc, who opened a PR that preceded this one at https://github.com/apache/incubator-spark/pull/612. ExternalAppendOnlyMap uses key hash code to order the buffer streams from which spilled files are read back into memory. When a buffer stream is empty, the default hash code for that stream is equal to Int.MaxValue. This is, however, a perfectly legitimate candidate for a key hash code. When reading from a spilled map containing such a key, a hash collision may occur, in which case we attempt to read from an empty stream and throw NoSuchElementException. The fix is to maintain the invariant that empty buffer streams are never added back to the merge queue to be considered. This guarantees that we never read from an empty buffer stream, ever again. This PR also includes two new tests for hash collisions. Author: Andrew Or <[email protected]> Closes apache#624 from andrewor14/spilling-bug and squashes the following commits: 9e7263d [Andrew Or] Slightly optimize next() 2037ae2 [Andrew Or] Move a few comments around... cf95942 [Andrew Or] Remove default value of Int.MaxValue for minKeyHash c11f03b [Andrew Or] Fix Int.MaxValue hash collision bug in ExternalAppendOnlyMap 21c1a39 [Andrew Or] Add hash collision tests to ExternalAppendOnlyMapSuite (cherry picked from commit fefd22f) Signed-off-by: Patrick Wendell <[email protected]>
RolatZhang
pushed a commit
to RolatZhang/spark
that referenced
this pull request
Aug 18, 2023
* KE-40433 add page index filter log * KE-40433 update parquet version
RolatZhang
pushed a commit
to RolatZhang/spark
that referenced
this pull request
Dec 8, 2023
KE-40433 add page index filter log (apache#619) (apache#624) * KE-40433 add page index filter log * KE-40433 update parquet version
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.