[WIP] SPARK-1699: Python relative independence from the core, becomes subprojects #624

witgo · 2014-05-03T06:21:53Z

No description provided.

The change adds the `./yarn/stable/target/<scala-version>/classes` to the _Classpath_ when a _dependencies_ assembly is available at the assembly directory. Why is this change necessary? Ease the development features and bug-fixes for Spark-YARN. [ticket: X] : NA Author : [email protected] Reviewer : ? Testing : ?

…ectory. Why is this change necessary? While developing in Spark I found myself rebuilding either the dependencies assembly or the full spark assembly. I kept running into the case of having both the dep-assembly and full-assembly in the same directory and getting an error when I called either `spark-shell` or `spark-submit`. Quick fix: move either of them as a .bkp file depending on the development work flow you are executing at the moment and enabling the `spark-class` to ignore non-jar files. An other option could be to move the "offending" jar to a different directory but in my opinion keeping them in there is a bit tidier. e.g. ``` ll ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp ``` [ticket: X] : ?

…UNCH_COMMAND . Why is this change necessary? Most likely when enabling the `--log-conf` through the `spark-shell` you are also interested on the full invocation of the java command including the _classpath_ and extended options. e.g. ``` INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark INFO: Spark Master is yarn-client INFO: Spark REPL options -Dspark.logConf=true Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp :/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop -XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log -XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof -XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails -XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution -XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation -Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC -Dspark.cleaner.ttl=10000 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true -Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main ``` [ticket: X] : ?

Why is this change necessary? Renamed the SBT "root" project to "spark" to enhance readability. Currently the assembly is qualified with the Hadoop Version but not if YARN has been enabled or not. This change qualifies the assembly such that it is easy to identify if YARN was enabled. e.g ``` ./make-distribution.sh --hadoop 2.3.0 --with-yarn ls -l ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-yarn.jar ``` vs ``` ./make-distribution.sh --hadoop 2.3.0 ls -l ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar ``` [ticket: X] : ?

Upgraded to YARN 2.3.0, removed unnecessary `relativePath` values and removed incorrect version for the "org.apache.hadoop:hadoop-client" dependency at yarn/pom.xml.

…nges

…d_build

…n spark built for hadoop 2.3.0 , 2.4.0

…ad to throw an SecurityException when Spark built for hadoop 2.3.0 , 2.4.0

…omes subprojects

AmplabJenkins · 2014-05-03T06:22:58Z

Can one of the admins verify this patch?

…d_build

witgo · 2014-05-03T15:22:51Z

Branch is wrong, temporarily closed.

@guojc

The original poster of this bug is @guojc, who opened a PR that preceded this one at https://github.com/apache/incubator-spark/pull/612. ExternalAppendOnlyMap uses key hash code to order the buffer streams from which spilled files are read back into memory. When a buffer stream is empty, the default hash code for that stream is equal to Int.MaxValue. This is, however, a perfectly legitimate candidate for a key hash code. When reading from a spilled map containing such a key, a hash collision may occur, in which case we attempt to read from an empty stream and throw NoSuchElementException. The fix is to maintain the invariant that empty buffer streams are never added back to the merge queue to be considered. This guarantees that we never read from an empty buffer stream, ever again. This PR also includes two new tests for hash collisions. Author: Andrew Or <[email protected]> Closes apache#624 from andrewor14/spilling-bug and squashes the following commits: 9e7263d [Andrew Or] Slightly optimize next() 2037ae2 [Andrew Or] Move a few comments around... cf95942 [Andrew Or] Remove default value of Int.MaxValue for minKeyHash c11f03b [Andrew Or] Fix Int.MaxValue hash collision bug in ExternalAppendOnlyMap 21c1a39 [Andrew Or] Add hash collision tests to ExternalAppendOnlyMapSuite (cherry picked from commit fefd22f) Signed-off-by: Patrick Wendell <[email protected]>

@guojc

… bug The original poster of this bug is @guojc, who opened a PR that preceded this one at https://github.com/apache/incubator-spark/pull/612. ExternalAppendOnlyMap uses key hash code to order the buffer streams from which spilled files are read back into memory. When a buffer stream is empty, the default hash code for that stream is equal to Int.MaxValue. This is, however, a perfectly legitimate candidate for a key hash code. When reading from a spilled map containing such a key, a hash collision may occur, in which case we attempt to read from an empty stream and throw NoSuchElementException. The fix is to maintain the invariant that empty buffer streams are never added back to the merge queue to be considered. This guarantees that we never read from an empty buffer stream, ever again. This PR also includes two new tests for hash collisions. Author: Andrew Or <[email protected]> Closes apache#624 from andrewor14/spilling-bug and squashes the following commits: 9e7263d [Andrew Or] Slightly optimize next() 2037ae2 [Andrew Or] Move a few comments around... cf95942 [Andrew Or] Remove default value of Int.MaxValue for minKeyHash c11f03b [Andrew Or] Fix Int.MaxValue hash collision bug in ExternalAppendOnlyMap 21c1a39 [Andrew Or] Add hash collision tests to ExternalAppendOnlyMapSuite (cherry picked from commit fefd22f) Signed-off-by: Patrick Wendell <[email protected]>

* KE-40433 add page index filter log * KE-40433 update parquet version

KE-40433 add page index filter log (apache#619) (apache#624) * KE-40433 add page index filter log * KE-40433 update parquet version

berngp and others added 22 commits April 15, 2014 14:03

Upgrade the Maven Build to YARN 2.3.0.

889bf4e

Upgraded to YARN 2.3.0, removed unnecessary `relativePath` values and removed incorrect version for the "org.apache.hadoop:hadoop-client" dependency at yarn/pom.xml.

merge https://github.com/berngp/spark/commits/feature/small-shell-cha…

460510a

…nges

Improved build configuration Ⅱ

f1c7535

review commit

8540e83

review commit

c4c6e45

Merge branch 'master' of https://github.com/apache/spark into improve…

9f08e80

…d_build

improve travis tests coverage

e1a7e00

missing ","

effe79c

add the dependency of commons-lang

9ea1af9

SPARK-1693: Most of the tests throw a java.lang.SecurityException whe…

0ed124d

…n spark built for hadoop 2.3.0 , 2.4.0

revert .travis.yml

03b136f

Add the missing yarn dependencies

d3488c6

Fix SPARK-1693: Dependent on multiple versions of servlet-api jars le…

779ae5d

…ad to throw an SecurityException when Spark built for hadoop 2.3.0 , 2.4.0

review commit

27bd426

review commit

54a86b0

review commit

882e35d

Compile hive optional

31451df

SPARK-1699: Python relative should be independence from the core, bec…

d9a31db

…omes subprojects

witgo added 6 commits May 3, 2014 15:54

revert exclusion org.eclipse.jetty.orbit:javax.servlet

5fb961f

Merge branch 'master' of https://github.com/apache/spark into improve…

ea53549

…d_build

revert exclusion org.eclipse.jetty.orbit:javax.servlet

a5ff7d1

merge master

bb4c435

bugfix

bd6cca1

add licenses

1cd8cfa

witgo changed the title ~~SPARK-1699: Python relative independence from the core, becomes subprojects~~ [WIP]SPARK-1699: Python relative independence from the core, becomes subprojects May 3, 2014

witgo changed the title ~~[WIP]SPARK-1699: Python relative independence from the core, becomes subprojects~~ [WIP] SPARK-1699: Python relative independence from the core, becomes subprojects May 3, 2014

witgo closed this May 3, 2014

witgo deleted the python-api branch March 13, 2015 09:02

RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 18, 2023

KE-40433 add page index filter log (apache#619) (apache#624)

19fef37

* KE-40433 add page index filter log * KE-40433 update parquet version

RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 8, 2023

cherry-pick spark3.2 KE commit

0e4f271

KE-40433 add page index filter log (apache#619) (apache#624) * KE-40433 add page index filter log * KE-40433 update parquet version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] SPARK-1699: Python relative independence from the core, becomes subprojects #624

[WIP] SPARK-1699: Python relative independence from the core, becomes subprojects #624

witgo commented May 3, 2014

AmplabJenkins commented May 3, 2014

witgo commented May 3, 2014

[WIP] SPARK-1699: Python relative independence from the core, becomes subprojects #624

[WIP] SPARK-1699: Python relative independence from the core, becomes subprojects #624

Conversation

witgo commented May 3, 2014

AmplabJenkins commented May 3, 2014

witgo commented May 3, 2014