Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update #23

Merged
merged 49 commits into from
Dec 31, 2014
Merged

Update #23

merged 49 commits into from
Dec 31, 2014

Commits on Dec 24, 2014

  1. [SPARK-4881][Minor] Use SparkConf#getBoolean instead of get().toBoolean

    It's really a minor issue.
    
    In ApplicationMaster, there is code like as follows.
    
        val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", "false").toBoolean
    
    I think, the code can be simplified like as follows.
    
        val preserveFiles = sparkConf.getBoolean("spark.yarn.preserve.staging.files", false)
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #3733 from sarutak/SPARK-4881 and squashes the following commits:
    
    1771430 [Kousuke Saruta] Modified the code like sparkConf.get(...).toBoolean to sparkConf.getBoolean(...)
    c63daa0 [Kousuke Saruta] Simplified code
    sarutak authored and JoshRosen committed Dec 24, 2014
    Configuration menu
    Copy the full SHA
    199e59a View commit details
    Browse the repository at this point in the history
  2. SPARK-4297 [BUILD] Build warning fixes omnibus

    There are a number of warnings generated in a normal, successful build right now. They're mostly Java unchecked cast warnings, which can be suppressed. But there's a grab bag of other Scala language warnings and so on that can all be easily fixed. The forthcoming PR fixes about 90% of the build warnings I see now.
    
    Author: Sean Owen <[email protected]>
    
    Closes #3157 from srowen/SPARK-4297 and squashes the following commits:
    
    8c9e469 [Sean Owen] Suppress unchecked cast warnings, and several other build warning fixes
    srowen authored and JoshRosen committed Dec 24, 2014
    Configuration menu
    Copy the full SHA
    29fabb1 View commit details
    Browse the repository at this point in the history

Commits on Dec 25, 2014

  1. [SPARK-4873][Streaming] Use Future.zip instead of Future.flatMap(…

    …for-loop) in WriteAheadLogBasedBlockHandler
    
    Use `Future.zip` instead of `Future.flatMap`(for-loop). `zip` implies these two Futures will run concurrently, while `flatMap` usually means one Future depends on the other one.
    
    Author: zsxwing <[email protected]>
    
    Closes #3721 from zsxwing/SPARK-4873 and squashes the following commits:
    
    46a2cd9 [zsxwing] Use Future.zip instead of Future.flatMap(for-loop)
    zsxwing authored and tdas committed Dec 25, 2014
    Configuration menu
    Copy the full SHA
    b4d0db8 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4953][Doc] Fix the description of building Spark with YARN

    At the section "Specifying the Hadoop Version" In building-spark.md, there is description about building with YARN with Hadoop 0.23.
    Spark 1.3.0 will not support Hadoop 0.23 so we should fix the description.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #3787 from sarutak/SPARK-4953 and squashes the following commits:
    
    ee9c355 [Kousuke Saruta] Removed description related to a specific vendor
    9ab0c24 [Kousuke Saruta] Fix the description about building SPARK with YARN
    sarutak authored and pwendell committed Dec 25, 2014
    Configuration menu
    Copy the full SHA
    11dd993 View commit details
    Browse the repository at this point in the history
  3. Fix "Building Spark With Maven" link in README.md

    Corrected link to the Building Spark with Maven page from its original (http://spark.apache.org/docs/latest/building-with-maven.html) to the current page (http://spark.apache.org/docs/latest/building-spark.html)
    
    Author: Denny Lee <[email protected]>
    
    Closes #3802 from dennyglee/patch-1 and squashes the following commits:
    
    15f601a [Denny Lee] Update README.md
    dennyglee authored and JoshRosen committed Dec 25, 2014
    Configuration menu
    Copy the full SHA
    08b18c7 View commit details
    Browse the repository at this point in the history
  4. [EC2] Update default Spark version to 1.2.0

    Now that 1.2.0 is out, let's update the default Spark version.
    
    Author: Nicholas Chammas <[email protected]>
    
    Closes #3793 from nchammas/patch-1 and squashes the following commits:
    
    3255832 [Nicholas Chammas] add 1.2.0 version to Spark-Shark map
    ec0e904 [Nicholas Chammas] [EC2] Update default Spark version to 1.2.0
    nchammas authored and JoshRosen committed Dec 25, 2014
    Configuration menu
    Copy the full SHA
    b6b6393 View commit details
    Browse the repository at this point in the history
  5. [EC2] Update mesos/spark-ec2 branch to branch-1.3

    Going forward, we'll use matching branch names across the mesos/spark-ec2 and apache/spark repositories, per [the discussion here](mesos/spark-ec2#85 (comment)).
    
    Author: Nicholas Chammas <[email protected]>
    
    Closes #3804 from nchammas/patch-2 and squashes the following commits:
    
    cd2c0d4 [Nicholas Chammas] [EC2] Update mesos/spark-ec2 branch to branch-1.3
    nchammas authored and JoshRosen committed Dec 25, 2014
    Configuration menu
    Copy the full SHA
    ac82785 View commit details
    Browse the repository at this point in the history

Commits on Dec 26, 2014

  1. [SPARK-4537][Streaming] Expand StreamingSource to add more metrics

    Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting.
    
    Author: jerryshao <[email protected]>
    
    Closes #3466 from jerryshao/SPARK-4537 and squashes the following commits:
    
    00f5f7f [jerryshao] Change the code style and add totalProcessedRecords
    44721a6 [jerryshao] Further address the comments
    c097ddc [jerryshao] Address the comments
    02dd44f [jerryshao] Fix the addressed comments
    c7a9376 [jerryshao] Expand StreamingSource to add more metrics
    jerryshao authored and tdas committed Dec 26, 2014
    Configuration menu
    Copy the full SHA
    f205fe4 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4608][Streaming] Reorganize StreamingContext implicit to impro…

    …ve API convenience
    
    There is only one implicit function `toPairDStreamFunctions` in `StreamingContext`. This PR did similar reorganization like [SPARK-4397](https://issues.apache.org/jira/browse/SPARK-4397).
    
    Compiled the following codes with Spark Streaming 1.1.0 and ran it with this PR. Everything is fine.
    ```Scala
    import org.apache.spark._
    import org.apache.spark.streaming._
    import org.apache.spark.streaming.StreamingContext._
    
    object StreamingApp {
    
      def main(args: Array[String]) {
        val conf = new SparkConf().setMaster("local[2]").setAppName("FileWordCount")
        val ssc = new StreamingContext(conf, Seconds(10))
        val lines = ssc.textFileStream("/some/path")
        val words = lines.flatMap(_.split(" "))
        val pairs = words.map(word => (word, 1))
        val wordCounts = pairs.reduceByKey(_ + _)
        wordCounts.print()
    
        ssc.start()
        ssc.awaitTermination()
      }
    }
    ```
    
    Author: zsxwing <[email protected]>
    
    Closes #3464 from zsxwing/SPARK-4608 and squashes the following commits:
    
    aa6d44a [zsxwing] Fix a copy-paste error
    f74c190 [zsxwing] Merge branch 'master' into SPARK-4608
    e6f9cc9 [zsxwing] Update the docs
    27833bb [zsxwing] Remove `import StreamingContext._`
    c15162c [zsxwing] Reorganize StreamingContext implicit to improve API convenience
    zsxwing authored and tdas committed Dec 26, 2014
    Configuration menu
    Copy the full SHA
    f9ed2b6 View commit details
    Browse the repository at this point in the history
  3. SPARK-4971: Fix typo in BlockGenerator comment

    Author: CodingCat <[email protected]>
    
    Closes #3807 from CodingCat/new_branch and squashes the following commits:
    
    5167f01 [CodingCat] fix typo in the comment
    CodingCat authored and JoshRosen committed Dec 26, 2014
    Configuration menu
    Copy the full SHA
    fda4331 View commit details
    Browse the repository at this point in the history

Commits on Dec 27, 2014

  1. MAINTENANCE: Automated closing of pull requests.

    This commit exists to close the following pull requests on Github:
    
    Closes #3456 (close requested by 'pwendell')
    Closes #1602 (close requested by 'tdas')
    Closes #2633 (close requested by 'tdas')
    Closes #2059 (close requested by 'JoshRosen')
    Closes #2348 (close requested by 'tdas')
    Closes #3662 (close requested by 'tdas')
    Closes #2031 (close requested by 'andrewor14')
    Closes #265 (close requested by 'JoshRosen')
    pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    534f24b View commit details
    Browse the repository at this point in the history
  2. [SPARK-3787][BUILD] Assembly jar name is wrong when we build with sbt…

    … omitting -Dhadoop.version
    
    This PR is another solution for When we build with sbt with profile for hadoop and without property for hadoop version like:
    
        sbt/sbt -Phadoop-2.2 assembly
    
    jar name is always used default version (1.0.4).
    
    When we build with maven with same condition for sbt, default version for each profile is used.
    For instance, if we  build like:
    
        mvn -Phadoop-2.2 package
    
    jar name is used hadoop2.2.0 as a default version of hadoop-2.2.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #3046 from sarutak/fix-assembly-jarname-2 and squashes the following commits:
    
    41ef90e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname-2
    50c8676 [Kousuke Saruta] Merge branch 'fix-assembly-jarname-2' of github.com:sarutak/spark into fix-assembly-jarname-2
    52a1cd2 [Kousuke Saruta] Fixed comflicts
    dd30768 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname2
    f1c90bb [Kousuke Saruta] Fixed SparkBuild.scala in order to read `hadoop.version` property from pom.xml
    af6b100 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
    c81806b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
    ad1f96e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
    b2318eb [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-assembly-jarname
    5fc1259 [Kousuke Saruta] Fixed typo.
    eebbb7d [Kousuke Saruta] Fixed wrong jar name
    sarutak authored and pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    de95c57 View commit details
    Browse the repository at this point in the history
  3. HOTFIX: Slight tweak on previous commit.

    Meant to merge this in when committing SPARK-3787.
    pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    82bf4be View commit details
    Browse the repository at this point in the history
  4. [SPARK-3955] Different versions between jackson-mapper-asl and jackso…

    …n-c...
    
    ...ore-asl
    
    - set the same version to jackson-mapper-asl and jackson-core-asl
    - It's related with #2818
    - coded a same patch from a latest master
    
    Author: Jongyoul Lee <[email protected]>
    
    Closes #3716 from jongyoul/SPARK-3955 and squashes the following commits:
    
    efa29aa [Jongyoul Lee] [SPARK-3955] Different versions between jackson-mapper-asl and jackson-core-asl - set the same version to jackson-mapper-asl and jackson-core-asl
    jongyoul authored and pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    2483c1e View commit details
    Browse the repository at this point in the history
  5. [SPARK-4954][Core] add spark version infomation in log for standalone…

    … mode
    
    The master and worker spark version may be not the same with Driver spark version. That is because spark Jar file might be replaced for new application without restarting the spark cluster. So there shall log out the spark-version in both Mater and Worker log.
    
    Author: Zhang, Liye <[email protected]>
    
    Closes #3790 from liyezhang556520/version4Standalone and squashes the following commits:
    
    e05e1e3 [Zhang, Liye] add spark version infomation in log for standalone mode
    liyezhang556520 authored and pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    786808a View commit details
    Browse the repository at this point in the history
  6. [SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv…

    ….environmentDetails
    
    Author: GuoQiang Li <[email protected]>
    
    Closes #3788 from witgo/SPARK-4952 and squashes the following commits:
    
    d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails
    witgo authored and pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    080ceb7 View commit details
    Browse the repository at this point in the history
  7. [SPARK-4501][Core] - Create build/mvn to automatically download maven…

    …/zinc/scalac
    
    Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would.
    
    Author: Brennon York <[email protected]>
    
    Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits:
    
    0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this)
    9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn
    d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn
    b979c58 [Brennon York] updated the merge conflict
    c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt
    b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version
    be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set
    28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead
    7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed
    14a5da0 [Brennon York] removed unnecessary zinc output on startup
    1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable
    3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed
    a680d12 [Brennon York] Added comments to functions and tested various mvn calls
    bb8cc9d [Brennon York] removed package files
    ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli
    07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version
    f914dea [Brennon York] Beginning final portions of localized scala home
    69c4e44 [Brennon York] working linux and osx installers for purely local mvn build
    4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder
    cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability
    Brennon York authored and pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    a3e51cc View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2014

  1. [SPARK-4966][YARN]The MemoryOverhead value is setted not correctly

    Author: meiyoula <[email protected]>
    
    Closes #3797 from XuTingjun/MemoryOverhead and squashes the following commits:
    
    5a780fc [meiyoula] Update ClientArguments.scala
    XuTingjun authored and tgravescs committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    14fa87b View commit details
    Browse the repository at this point in the history
  2. [SPARK-4982][DOC] spark.ui.retainedJobs description is wrong in Spa…

    …rk UI configuration guide
    
    Author: wangxiaojing <[email protected]>
    
    Closes #3818 from wangxiaojing/SPARK-4982 and squashes the following commits:
    
    fe2ad5f [wangxiaojing] change stages to jobs
    wangxiaojing authored and JoshRosen committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    6645e52 View commit details
    Browse the repository at this point in the history
  3. Adde LICENSE Header to build/mvn, build/sbt and sbt/sbt

    Recently, build/mvn and build/sbt are added, and sbt/sbt is changed but there are no license headers. Should we add license headers to the scripts right?
    If it's not right, please let me correct.
    
    This PR doesn't affect behavior of Spark, I don't file in JIRA.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #3817 from sarutak/add-license-header and squashes the following commits:
    
    1abc972 [Kousuke Saruta] Added LICENSE Header
    sarutak authored and JoshRosen committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    4cef05e View commit details
    Browse the repository at this point in the history
  4. [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.…

    …askTracker to reduce the chance of the communicating problem
    
    Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem
    
    Author: YanTangZhai <[email protected]>
    Author: yantangzhai <[email protected]>
    
    Closes #3785 from YanTangZhai/SPARK-4946 and squashes the following commits:
    
    9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem
    e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master
    718afeb [YanTangZhai] Merge pull request #12 from apache/master
    6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
    e249846 [YanTangZhai] Merge pull request #10 from apache/master
    d26d982 [YanTangZhai] Merge pull request #9 from apache/master
    76d4027 [YanTangZhai] Merge pull request #8 from apache/master
    03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
    8a00106 [YanTangZhai] Merge pull request #6 from apache/master
    cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
    cdef539 [YanTangZhai] Merge pull request #1 from apache/master
    YanTangZhai authored and JoshRosen committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    815de54 View commit details
    Browse the repository at this point in the history
  5. [Minor] Fix a typo of type parameter in JavaUtils.scala

    In JavaUtils.scala, thare is a typo of type parameter. In addition, the type information is removed at the time of compile by erasure.
    
    This issue is really minor so I don't  file in JIRA.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #3789 from sarutak/fix-typo-in-javautils and squashes the following commits:
    
    e20193d [Kousuke Saruta] Fixed a typo of type parameter
    82bc5d9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-typo-in-javautils
    99f6f63 [Kousuke Saruta] Fixed a typo of type parameter in JavaUtils.scala
    sarutak authored and rxin committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    8d72341 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4409][MLlib] Additional Linear Algebra Utils

    Addition of a very limited number of local matrix manipulation and generation methods that would be helpful in the further development for algorithms on top of BlockMatrix (SPARK-3974), such as Randomized SVD, and Multi Model Training (SPARK-1486).
    The proposed methods for addition are:
    
    For `Matrix`
     - map: maps the values in the matrix with a given function. Produces a new matrix.
     - update: the values in the matrix are updated with a given function. Occurs in place.
    
    Factory methods for `DenseMatrix`:
     - *zeros: Generate a matrix consisting of zeros
     - *ones: Generate a matrix consisting of ones
     - *eye: Generate an identity matrix
     - *rand: Generate a matrix consisting of i.i.d. uniform random numbers
     - *randn: Generate a matrix consisting of i.i.d. gaussian random numbers
     - *diag: Generate a diagonal matrix from a supplied vector
    *These methods already exist in the factory methods for `Matrices`, however for cases where we require a `DenseMatrix`, you constantly have to add `.asInstanceOf[DenseMatrix]` everywhere, which makes the code "dirtier". I propose moving these functions to factory methods for `DenseMatrix` where the putput will be a `DenseMatrix` and the factory methods for `Matrices` will call these functions directly and output a generic `Matrix`.
    
    Factory methods for `SparseMatrix`:
     - speye: Identity matrix in sparse format. Saves a ton of memory when dimensions are large, especially in Multi Model Training, where each row requires being multiplied by a scalar.
     - sprand: Generate a sparse matrix with a given density consisting of i.i.d. uniform random numbers.
     - sprandn: Generate a sparse matrix with a given density consisting of i.i.d. gaussian random numbers.
     - diag: Generate a diagonal matrix from a supplied vector, but is memory efficient, because it just stores the diagonal. Again, very helpful in Multi Model Training.
    
    Factory methods for `Matrices`:
     - Include all the factory methods given above, but return a generic `Matrix` rather than `SparseMatrix` or `DenseMatrix`.
     - horzCat: Horizontally concatenate matrices to form one larger matrix. Very useful in both Multi Model Training, and for the repartitioning of BlockMatrix.
     - vertCat: Vertically concatenate matrices to form one larger matrix. Very useful for the repartitioning of BlockMatrix.
    
    The names for these methods were selected from MATLAB
    
    Author: Burak Yavuz <[email protected]>
    Author: Xiangrui Meng <[email protected]>
    
    Closes #3319 from brkyvz/SPARK-4409 and squashes the following commits:
    
    b0354f6 [Burak Yavuz] [SPARK-4409] Incorporated mengxr's code
    04c4829 [Burak Yavuz] Merge pull request #1 from mengxr/SPARK-4409
    80cfa29 [Xiangrui Meng] minor changes
    ecc937a [Xiangrui Meng] update sprand
    4e95e24 [Xiangrui Meng] simplify fromCOO implementation
    10a63a6 [Burak Yavuz] [SPARK-4409] Fourth pass of code review
    f62d6c7 [Burak Yavuz] [SPARK-4409] Modified genRandMatrix
    3971c93 [Burak Yavuz] [SPARK-4409] Third pass of code review
    75239f8 [Burak Yavuz] [SPARK-4409] Second pass of code review
    e4bd0c0 [Burak Yavuz] [SPARK-4409] Modified horzcat and vertcat
    65c562e [Burak Yavuz] [SPARK-4409] Hopefully fixed Java Test
    d8be7bc [Burak Yavuz] [SPARK-4409] Organized imports
    065b531 [Burak Yavuz] [SPARK-4409] First pass after code review
    a8120d2 [Burak Yavuz] [SPARK-4409] Finished updates to API according to SPARK-4614
    f798c82 [Burak Yavuz] [SPARK-4409] Updated API according to SPARK-4614
    c75f3cd [Burak Yavuz] [SPARK-4409] Added JavaAPI Tests, and fixed a couple of bugs
    d662f9d [Burak Yavuz] [SPARK-4409] Modified according to remote repo
    83dfe37 [Burak Yavuz] [SPARK-4409] Scalastyle error fixed
    a14c0da [Burak Yavuz] [SPARK-4409] Initial commit to add methods
    brkyvz authored and mengxr committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    02b55de View commit details
    Browse the repository at this point in the history
  7. SPARK-4968: takeOrdered to skip reduce step in case mappers return no…

    … partitions
    
    takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception :
    
    4. run query
    SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100;
    Error trace
    java.lang.UnsupportedOperationException: empty collection
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
    at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
    at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)
    
    Author: Yash Datta <[email protected]>
    
    Closes #3830 from saucam/fix_takeorder and squashes the following commits:
    
    5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions
    Yash Datta authored and rxin committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    9bc0df6 View commit details
    Browse the repository at this point in the history
  8. SPARK-4156 [MLLIB] EM algorithm for GMMs

    Implementation of Expectation-Maximization for Gaussian Mixture Models.
    
    This is my maiden contribution to Apache Spark, so I apologize now if I have done anything incorrectly; having said that, this work is my own, and I offer it to the project under the project's open source license.
    
    Author: Travis Galoppo <[email protected]>
    Author: Travis Galoppo <[email protected]>
    Author: tgaloppo <[email protected]>
    Author: FlytxtRnD <[email protected]>
    
    Closes #3022 from tgaloppo/master and squashes the following commits:
    
    aaa8f25 [Travis Galoppo] MLUtils: changed privacy of EPSILON from [util] to [mllib]
    709e4bf [Travis Galoppo] fixed usage line to include optional maxIterations parameter
    acf1fba [Travis Galoppo] Fixed parameter comment in GaussianMixtureModel Made maximum iterations an optional parameter to DenseGmmEM
    9b2fc2a [Travis Galoppo] Style improvements Changed ExpectationSum to a private class
    b97fe00 [Travis Galoppo] Minor fixes and tweaks.
    1de73f3 [Travis Galoppo] Removed redundant array from array creation
    578c2d1 [Travis Galoppo] Removed unused import
    227ad66 [Travis Galoppo] Moved prediction methods into model class.
    308c8ad [Travis Galoppo] Numerous changes to improve code
    cff73e0 [Travis Galoppo] Replaced accumulators with RDD.aggregate
    20ebca1 [Travis Galoppo] Removed unusued code
    42b2142 [Travis Galoppo] Added functionality to allow setting of GMM starting point. Added two cluster test to testing suite.
    8b633f3 [Travis Galoppo] Style issue
    9be2534 [Travis Galoppo] Style issue
    d695034 [Travis Galoppo] Fixed style issues
    c3b8ce0 [Travis Galoppo] Merge branch 'master' of https://github.com/tgaloppo/spark   Adds predict() method
    2df336b [Travis Galoppo] Fixed style issue
    b99ecc4 [tgaloppo] Merge pull request #1 from FlytxtRnD/predictBranch
    f407b4c [FlytxtRnD] Added predict() to return the cluster labels and membership values
    97044cf [Travis Galoppo] Fixed style issues
    dc9c742 [Travis Galoppo] Moved MultivariateGaussian utility class
    e7d413b [Travis Galoppo] Moved multivariate Gaussian utility class to mllib/stat/impl Improved comments
    9770261 [Travis Galoppo] Corrected a variety of style and naming issues.
    8aaa17d [Travis Galoppo] Added additional train() method to companion object for cluster count and tolerance parameters.
    676e523 [Travis Galoppo] Fixed to no longer ignore delta value provided on command line
    e6ea805 [Travis Galoppo] Merged with master branch; update test suite with latest context changes. Improved cluster initialization strategy.
    86fb382 [Travis Galoppo] Merge remote-tracking branch 'upstream/master'
    719d8cc [Travis Galoppo] Added scala test suite with basic test
    c1a8e16 [Travis Galoppo] Made GaussianMixtureModel class serializable Modified sum function for better performance
    5c96c57 [Travis Galoppo] Merge remote-tracking branch 'upstream/master'
    c15405c [Travis Galoppo] SPARK-4156
    tgaloppo authored and mengxr committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    6cf6fdf View commit details
    Browse the repository at this point in the history
  9. Added setMinCount to Word2Vec.scala

    Wanted to customize the private minCount variable in the Word2Vec class. Added
    a method to do so.
    
    Author: ganonp <[email protected]>
    
    Closes #3693 from ganonp/my-custom-spark and squashes the following commits:
    
    ad534f2 [ganonp] made norm method public
    5110a6f [ganonp] Reorganized
    854958b [ganonp] Fixed Indentation for setMinCount
    12ed8f9 [ganonp] Update Word2Vec.scala
    76bdf5a [ganonp] Update Word2Vec.scala
    ffb88bb [ganonp] Update Word2Vec.scala
    5eb9100 [ganonp] Added setMinCount to Word2Vec.scala
    ganonp authored and mengxr committed Dec 29, 2014
    Configuration menu
    Copy the full SHA
    343db39 View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2014

  1. [SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regress…

    …ion for the change of LeastSquaresGradient
    
    In #SPARK-4907, we added factor of 2 into the LeastSquaresGradient. We updated the scala doc for lasso and ridge regression here.
    
    Author: DB Tsai <[email protected]>
    
    Closes #3808 from dbtsai/doc and squashes the following commits:
    
    ec3c989 [DB Tsai] first commit
    DB Tsai authored and mengxr committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    040d6f2 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4920][UI] add version on master and worker page for standalone…

    … mode
    
    Author: Zhang, Liye <[email protected]>
    
    Closes #3769 from liyezhang556520/spark-4920_WebVersion and squashes the following commits:
    
    3bb7e0d [Zhang, Liye] add version on master and worker page
    liyezhang556520 authored and JoshRosen committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    9077e72 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4882] Register PythonBroadcast with Kryo so that PySpark works…

    … with KryoSerializer
    
    This PR fixes an issue where PySpark broadcast variables caused NullPointerExceptions if KryoSerializer was used.  The fix is to register PythonBroadcast with Kryo so that it's deserialized with a KryoJavaSerializer.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #3831 from JoshRosen/SPARK-4882 and squashes the following commits:
    
    0466c7a [Josh Rosen] Register PythonBroadcast with Kryo.
    d5b409f [Josh Rosen] Enable registrationRequired, which would have caught this bug.
    069d8a7 [Josh Rosen] Add failing test for SPARK-4882
    JoshRosen committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    efa80a5 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4908][SQL] Prevent multiple concurrent hive native commands

    This is just a quick fix that locks when calling `runHive`.  If we can find a way to avoid the error without a global lock that would be better.
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #3834 from marmbrus/hiveConcurrency and squashes the following commits:
    
    bf25300 [Michael Armbrust] prevent multiple concurrent hive native commands
    marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    480bd1d View commit details
    Browse the repository at this point in the history
  5. [SQL] enable view test

    This is a follow up of #3396 , just add a test to white list.
    
    Author: Daoyuan Wang <[email protected]>
    
    Closes #3826 from adrian-wang/viewtest and squashes the following commits:
    
    f105f68 [Daoyuan Wang] enable view test
    adrian-wang authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    94d60b7 View commit details
    Browse the repository at this point in the history
  6. [SPARK-4975][SQL] Fix HiveInspectorSuite test failure

    HiveInspectorSuite test failure:
    [info] - wrap / unwrap null, constant null and writables *** FAILED *** (21 milliseconds)
    [info] 1 did not equal 0 (HiveInspectorSuite.scala:136)
    this is because the origin date(is 3914-10-23) not equals the date returned by ```unwrap```(is 3914-10-22).
    
    Setting TimeZone and Locale fix this.
    Another minor change here is rename ```def checkValues(v1: Any, v2: Any): Unit```  to  ```def checkValue(v1: Any, v2: Any): Unit ``` to make the code more clear
    
    Author: scwf <[email protected]>
    Author: Fei Wang <[email protected]>
    
    Closes #3814 from scwf/fix-inspectorsuite and squashes the following commits:
    
    d8531ef [Fei Wang] Delete test.log
    72b19a9 [scwf] fix HiveInspectorSuite test error
    scwf authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    65357f1 View commit details
    Browse the repository at this point in the history
  7. [SPARK-4959] [SQL] Attributes are case sensitive when using a select …

    …query from a projection
    
    Author: Cheng Hao <[email protected]>
    
    Closes #3796 from chenghao-intel/spark_4959 and squashes the following commits:
    
    3ec08f8 [Cheng Hao] Replace the attribute in comparing its exprId other than itself
    chenghao-intel authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    5595eaa View commit details
    Browse the repository at this point in the history
  8. [SPARK-4904] [SQL] Remove the unnecessary code change in Generic UDF

    Since #3429 has been merged, the bug of wrapping to Writable for HiveGenericUDF is resolved, we can safely remove the foldable checking in `HiveGenericUdf.eval`, which discussed in #2802.
    
    Author: Cheng Hao <[email protected]>
    
    Closes #3745 from chenghao-intel/generic_udf and squashes the following commits:
    
    622ad03 [Cheng Hao] Remove the unnecessary code change in Generic UDF
    chenghao-intel authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    63b84b7 View commit details
    Browse the repository at this point in the history
  9. [SPARK-5002][SQL] Using ascending by default when not specify order i…

    …n order by
    
    spark sql does not support ```SELECT a, b FROM testData2 ORDER BY a desc, b```.
    
    Author: wangfei <[email protected]>
    
    Closes #3838 from scwf/orderby and squashes the following commits:
    
    114b64a [wangfei] remove nouse methods
    48145d3 [wangfei] fix order, using asc by default
    scwf authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    daac221 View commit details
    Browse the repository at this point in the history
  10. [Spark-4512] [SQL] Unresolved Attribute Exception in Sort By

    It will cause exception while do query like:
    SELECT key+key FROM src sort by value;
    
    Author: Cheng Hao <[email protected]>
    
    Closes #3386 from chenghao-intel/sort and squashes the following commits:
    
    38c78cc [Cheng Hao] revert the SortPartition in SparkStrategies
    7e9dd15 [Cheng Hao] update the typo
    fcd1d64 [Cheng Hao] rebase the latest master and update the SortBy unit test
    chenghao-intel authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    53f0a00 View commit details
    Browse the repository at this point in the history
  11. [SPARK-4493][SQL] Tests for IsNull / IsNotNull in the ParquetFilterSuite

    This is a follow-up of #3367 and #3644.
    
    At the time #3644 was written, #3367 hadn't been merged yet, thus `IsNull` and `IsNotNull` filters are not covered in the first version of `ParquetFilterSuite`. This PR adds corresponding test cases.
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3748)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <[email protected]>
    
    Closes #3748 from liancheng/test-null-filters and squashes the following commits:
    
    1ab943f [Cheng Lian] IsNull and IsNotNull Parquet filter test case for boolean type
    bcd616b [Cheng Lian] Adds Parquet filter pushedown tests for IsNull and IsNotNull
    liancheng authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    19a8802 View commit details
    Browse the repository at this point in the history
  12. [SPARK-4916][SQL][DOCS]Update SQL programming guide about cache section

    `SchemeRDD.cache()` now uses in-memory columnar storage.
    
    Author: luogankun <[email protected]>
    
    Closes #3759 from luogankun/SPARK-4916 and squashes the following commits:
    
    7b39864 [luogankun] [SPARK-4916]Update SQL programming guide
    6018122 [luogankun] Merge branch 'master' of https://github.com/apache/spark into SPARK-4916
    0b93785 [luogankun] [SPARK-4916]Update SQL programming guide
    99b2336 [luogankun] [SPARK-4916]Update SQL programming guide
    luogankun authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    f7a41a0 View commit details
    Browse the repository at this point in the history
  13. [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE is e…

    …ager
    
    `CACHE TABLE tbl` is now __eager__ by default not __lazy__
    
    Author: luogankun <[email protected]>
    
    Closes #3773 from luogankun/SPARK-4930 and squashes the following commits:
    
    cc17b7d [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, add CACHE [LAZY] TABLE [AS SELECT] ...
    bffe0e8 [luogankun] [SPARK-4930][SQL][DOCS]Update SQL programming guide, CACHE TABLE tbl is eager
    luogankun authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    2deac74 View commit details
    Browse the repository at this point in the history
  14. [SPARK-4928][SQL] Fix: Operator '>,<,>=,<=' with decimal between diff…

    …erent precision report error
    
    case operator  with decimal between different precision, we need change them to unlimited
    
    Author: guowei2 <[email protected]>
    
    Closes #3767 from guowei2/SPARK-4928 and squashes the following commits:
    
    c6a6e3e [guowei2] fix code style
    3214e0a [guowei2] add test case
    b4985a2 [guowei2] fix code style
    27adf42 [guowei2] Fix: Operation '>,<,>=,<=' with Decimal report error
    guowei2 authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    a75dd83 View commit details
    Browse the repository at this point in the history
  15. [SPARK-4937][SQL] Normalizes conjunctions and disjunctions to elimina…

    …te common predicates
    
    This PR is a simplified version of several filter optimization rules introduced in #3778 authored by scwf. Newly introduced optimizations include:
    
    1. `a && a` => `a`
    2. `a || a` => `a`
    3. `(a || b || c || ...) && (a || b || d || ...)` => `a && b && (c || d || ...)`
    
    The 3rd rule is particularly useful for optimizing the following query, which is planned into a cartesian product
    
    ```sql
    SELECT *
      FROM t1, t2
     WHERE (t1.key = t2.key AND t1.value > 10)
        OR (t1.key = t2.key AND t2.value < 20)
    ```
    
    to the following one, which is planned into an equi-join:
    
    ```sql
    SELECT *
      FROM t1, t2
     WHERE t1.key = t2.key
       AND (t1.value > 10 OR t2.value < 20)
    ```
    
    The example above is quite artificial, but common predicates are likely to appear in real life complex queries (like the one mentioned in #3778).
    
    A difference between this PR and #3778 is that these optimizations are not limited to `Filter`, but are generalized to all logical plan nodes. Thanks to scwf for bringing up these optimizations, and chenghao-intel for the generalization suggestion.
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3784)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <[email protected]>
    
    Closes #3784 from liancheng/normalize-filters and squashes the following commits:
    
    caca560 [Cheng Lian] Moves filter normalization into BooleanSimplification rule
    4ab3a58 [Cheng Lian] Fixes test failure, adds more tests
    5d54349 [Cheng Lian] Fixes typo in comment
    2abbf8e [Cheng Lian] Forgot our sacred Apache licence header...
    cf95639 [Cheng Lian] Adds an optimization rule for filter normalization
    liancheng authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    61a99f6 View commit details
    Browse the repository at this point in the history
  16. [SPARK-4386] Improve performance when writing Parquet files

    Convert type of RowWriteSupport.attributes to Array.
    
    Analysis of performance for writing very wide tables shows that time is spent predominantly in apply method on  attributes var. Type of attributes previously was LinearSeqOptimized and apply is O(N) which made write O(N squared).
    
    Measurements on 575 column table showed this change made a 6x improvement in write times.
    
    Author: Michael Davies <[email protected]>
    
    Closes #3843 from MickDavies/SPARK-4386 and squashes the following commits:
    
    892519d [Michael Davies] [SPARK-4386] Improve performance when writing Parquet files
    MickDavies authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    7425bec View commit details
    Browse the repository at this point in the history
  17. [SPARK-4935][SQL] When hive.cli.print.header configured, spark-sql ab…

    …orted if passed in a invalid sql
    
    If we passed in a wrong sql like ```abdcdfsfs```, the spark-sql script aborted.
    
    Author: wangfei <[email protected]>
    Author: Fei Wang <[email protected]>
    
    Closes #3761 from scwf/patch-10 and squashes the following commits:
    
    46dc344 [Fei Wang] revert console.printError(rc.getErrorMessage())
    0330e07 [wangfei] avoid to print error message repeatedly
    1614a11 [wangfei] spark-sql abort when passed in a wrong sql
    scwf authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    8f29b7c View commit details
    Browse the repository at this point in the history
  18. [SPARK-4570][SQL]add BroadcastLeftSemiJoinHash

    JIRA issue: [SPARK-4570](https://issues.apache.org/jira/browse/SPARK-4570)
    We are planning to create a `BroadcastLeftSemiJoinHash` to implement the broadcast join for `left semijoin`
    In left semijoin :
    If the size of data from right side is smaller than the user-settable threshold `AUTO_BROADCASTJOIN_THRESHOLD`,
    the planner would mark it as the `broadcast` relation and mark the other relation as the stream side. The broadcast table will be broadcasted to all of the executors involved in the join, as a `org.apache.spark.broadcast.Broadcast` object. It will use `joins.BroadcastLeftSemiJoinHash`.,else it will use `joins.LeftSemiJoinHash`.
    
    The benchmark suggests these  made the optimized version 4x faster  when `left semijoin`
    <pre><code>
    Original:
    left semi join : 9288 ms
    Optimized:
    left semi join : 1963 ms
    </code></pre>
    The micro benchmark load `data1/kv3.txt` into a normal Hive table.
    Benchmark code:
    <pre><code>
     def benchmark(f: => Unit) = {
        val begin = System.currentTimeMillis()
        f
        val end = System.currentTimeMillis()
        end - begin
      }
      val sc = new SparkContext(
        new SparkConf()
          .setMaster("local")
          .setAppName(getClass.getSimpleName.stripSuffix("$")))
      val hiveContext = new HiveContext(sc)
      import hiveContext._
      sql("drop table if exists left_table")
      sql("drop table if exists right_table")
      sql( """create table left_table (key int, value string)
           """.stripMargin)
      sql( s"""load data local inpath "/data1/kv3.txt" into table left_table""")
      sql( """create table right_table (key int, value string)
           """.stripMargin)
      sql(
        """
          |from left_table
          |insert overwrite table right_table
          |select left_table.key, left_table.value
        """.stripMargin)
    
      val leftSimeJoin = sql(
        """select a.key from left_table a
          |left semi join right_table b on a.key = b.key""".stripMargin)
      val leftSemiJoinDuration = benchmark(leftSimeJoin.count())
      println(s"left semi join : $leftSemiJoinDuration ms ")
    </code></pre>
    
    Author: wangxiaojing <[email protected]>
    
    Closes #3442 from wangxiaojing/SPARK-4570 and squashes the following commits:
    
    a4a43c9 [wangxiaojing] rebase
    f103983 [wangxiaojing] change style
    fbe4887 [wangxiaojing] change style
    ff2e618 [wangxiaojing] add testsuite
    1a8da2a [wangxiaojing] add BroadcastLeftSemiJoinHash
    wangxiaojing authored and marmbrus committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    07fa191 View commit details
    Browse the repository at this point in the history
  19. SPARK-3955 part 2 [CORE] [HOTFIX] Different versions between jackson-…

    …mapper-asl and jackson-core-asl
    
    pwendell 2483c1e didn't actually add a reference to `jackson-core-asl` as intended, but a second redundant reference to `jackson-mapper-asl`, as markhamstra picked up on (#3716 (comment))  This just rectifies the typo. I missed it as well; the original PR #2818 had it correct and I also didn't see the problem.
    
    Author: Sean Owen <[email protected]>
    
    Closes #3829 from srowen/SPARK-3955 and squashes the following commits:
    
    6cfdc4e [Sean Owen] Actually refer to jackson-core-asl
    srowen authored and pwendell committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    b239ea1 View commit details
    Browse the repository at this point in the history
  20. [Spark-4995] Replace Vector.toBreeze.activeIterator with foreachActive

    New foreachActive method of vector was introduced by SPARK-4431 as more efficient alternative to vector.toBreeze.activeIterator. There are some parts of codebase where it was not yet replaced.
    
    dbtsai
    
    Author: Jakub Dubovsky <[email protected]>
    
    Closes #3846 from james64/SPARK-4995-foreachActive and squashes the following commits:
    
    3eb7e37 [Jakub Dubovsky] Scalastyle fix
    32fe6c6 [Jakub Dubovsky] activeIterator removed - IndexedRowMatrix.toBreeze
    47a4777 [Jakub Dubovsky] activeIterator removed in RowMatrix.toBreeze
    90a7d98 [Jakub Dubovsky] activeIterator removed in MLUtils.saveAsLibSVMFile
    Jakub Dubovsky authored and mengxr committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    0f31992 View commit details
    Browse the repository at this point in the history
  21. [SPARK-4813][Streaming] Fix the issue that ContextWaiter didn't handl…

    …e 'spurious wakeup'
    
    Used `Condition` to rewrite `ContextWaiter` because it provides a convenient API `awaitNanos` for timeout.
    
    Author: zsxwing <[email protected]>
    
    Closes #3661 from zsxwing/SPARK-4813 and squashes the following commits:
    
    52247f5 [zsxwing] Add explicit unit type
    be42bcf [zsxwing] Update as per review suggestion
    e06bd4f [zsxwing] Fix the issue that ContextWaiter didn't handle 'spurious wakeup'
    zsxwing authored and tdas committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    6a89782 View commit details
    Browse the repository at this point in the history
  22. [SPARK-4998][MLlib]delete the "train" function

    To make the functions with the same in "object" effective, specially when using java reflection.
    As the "train" function defined in "class DecisionTree" will hide the functions with the same name in "object DecisionTree".
    
    JIRA[SPARK-4998]
    
    Author: Liu Jiongzhou <[email protected]>
    
    Closes #3836 from ljzzju/master and squashes the following commits:
    
    4e13133 [Liu Jiongzhou] [MLlib]delete the "train" function
    ljzzju authored and mengxr committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    035bac8 View commit details
    Browse the repository at this point in the history

Commits on Dec 31, 2014

  1. [SPARK-1010] Clean up uses of System.setProperty in unit tests

    Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures.
    
    This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself).
    
    For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure.  See the block comment at the top of the ResetSystemProperties class for more details.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits:
    
    0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools
    3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext
    4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties
    4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering.
    0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite.
    7a3d224 [Josh Rosen] Fix trait ordering
    3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite
    bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite
    655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite
    3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite
    cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite
    8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait.
    633a84a [Josh Rosen] Remove use of system properties in FileServerSuite
    25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite
    1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite
    dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite
    b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite
    e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite
    5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite
    0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite
    c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite
    51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite
    60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite
    14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite
    628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite
    9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite.
    4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class.
    JoshRosen committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    352ed6b View commit details
    Browse the repository at this point in the history