SPARK-1556: bump jets3t version to 0.9.0 #468

CodingCat · 2014-04-21T19:55:44Z

In Hadoop 2.3.0 or newer, Jets3t 0.9.0 which defines S3ServiceException/ServiceException is introduced, however, Spark still relies on Jets3t 0.7.x which has no definition of these classes

What I met (when I try to load data from s3) is as

14/04/21 19:30:53 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/04/21 19:30:53 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
14/04/21 19:30:53 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
14/04/21 19:30:53 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
14/04/21 19:30:53 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:369)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:891)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:741)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:692)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:574)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:900)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15)
at $iwC$$iwC$$iwC.<init>(<console>:20)
at $iwC$$iwC.<init>(<console>:22)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:793)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:838)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:750)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:598)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:605)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:931)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:881)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:881)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:973)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.S3ServiceException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 63 more

AmplabJenkins · 2014-04-21T19:57:55Z

Merged build triggered.

AmplabJenkins · 2014-04-21T19:58:04Z

Merged build started.

AmplabJenkins · 2014-04-21T20:17:55Z

Merged build triggered.

AmplabJenkins · 2014-04-21T20:18:04Z

Merged build started.

AmplabJenkins · 2014-04-21T20:36:53Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-21T20:36:53Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14297/

AmplabJenkins · 2014-04-21T20:55:38Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-21T20:55:38Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14299/

mateiz · 2014-04-22T07:01:58Z

Unfortunately this will not work in older Hadoop versions as far as I know. Can you still build Spark against Hadoop 1.0.4 and run it with this change?

It might be better to receive jets3t from Hadoop instead of depending on it ourselves. I'm not sure if hadoop-client depends on it...

srowen · 2014-04-22T08:14:24Z

@mateiz I thought the same thing, that hadoop-client pulls this in, but it does not. Only things like hadoop-hdfs.

I agree with updating the dependency, but to match the Hadoop version. So the 0.9.0 version belong in the Hadoop 2 profiles.

(Also it should be a runtime scope dependency in Maven.)

mateiz · 2014-04-22T16:06:03Z

In that case let's see exactly which Hadoop 2.x version bumped up the dependency, because I don't think 2.0 and 2.1 did it (could be wrong though).

srowen · 2014-04-22T16:47:23Z

@mateiz It looks like it went to 0.8.1 in (the unreleased) Hadoop 1.3.0 (https://issues.apache.org/jira/browse/HADOOP-8136) and 0.9.0 in 2.3.0 (https://issues.apache.org/jira/browse/HADOOP-9623)

mateiz · 2014-04-22T18:03:29Z

Great, so there's no easy way to set it based on profiles and support all Hadoop versions :). Maybe for Hadoop 2.3+ users, we can just tell them to add a new version of jets3t to their own project's build? We can certainly have our pre-built binaries include the right one too.

CodingCat · 2014-04-22T18:55:45Z

Hi, @mateiz @srowen , if Spark built with Hadoop 1.0.4/2.x (x < 3) and jets3t 0.9.0 can access S3 smoothly, does it also mean that bumping to 0.9.0 is safe?

I'm going to give a manual test tonight or tomorrow

mateiz · 2014-04-22T18:58:19Z

Sure, that would work. Please try it. Unfortunately I remember it having problems, but I could be wrong.

CodingCat · 2014-04-29T03:46:38Z

@mateiz you are right, I received the exception of ```java.lang.NoSuchMethodError: org.jets3t.service.impl.rest.httpclient.RestS3Service.(Lorg/jets3t/service/security/AWSCredentials;)V" in both

AmplabJenkins · 2014-04-29T03:47:57Z

Merged build triggered.

AmplabJenkins · 2014-04-29T03:48:06Z

Merged build started.

CodingCat · 2014-04-29T03:48:23Z

I recovered the build files and updated the documents to indicate this situation for the user

AmplabJenkins · 2014-04-29T04:26:25Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-29T04:26:26Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14551/

darose · 2014-04-29T21:13:47Z

Is there any way to apply this fix without a rebuild of spark? E.g., to just replace jets3t-0.7.1.jar with jets3t-0.9.0.jar in a deployed spark package? I'm running into this issue on a machine where I have the CDH5 hadoop and spark packages installed.

CodingCat · 2014-04-29T21:31:17Z

I think the possible way to do that is compile a jets3t0.9.0-enabled version by yourself

then compile your application against this version .... I think to access HDFS-compatible fs, we eventually call the code in application jar

mateiz · 2014-04-30T00:11:21Z

You can try adding jets3t 0.9 as a Maven dependency in your application, but unfortunately I think that goes after the Spark assembly JAR when running an app. In 1.0 there will be a setting to put the user's classpath first.

It sounds like the Spark bundle for CDH needs to be updated with this; CCing @srowen.

For this patch, we probably want to create a new Maven profile to use a new Jets3t when that's enabled.

CodingCat · 2014-04-30T00:17:26Z

@mateiz for @darose 's question, how about compile the application against a customized spark jar (with newer jets3t)? I think in that case, he does not need to restart the cluster?

mateiz · 2014-04-30T00:26:43Z

BTW the right way to do it would be to make hadoop-client have a Maven dependency on the right version of Jets3t. Then Spark would just build with the right version out of the box when it linked to the right Hadoop version.

darose · 2014-04-30T14:43:37Z

Definitely worth a shot! Will give that a try and report back.

CodingCat · 2014-04-30T14:45:18Z

Hi, @srowen, do you want to take over the patch? I'm concerning I cannot fix it in the following days, considering my schedule and my knowledge level on mvn and sbt?

darose · 2014-04-30T15:10:41Z

Sigh. Was a promising idea, but no dice. Even with the 0.7 jars out of the way, I'm still getting java.lang.NoClassDefFoundError: org/jets3t/service/S3ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270)
...
at shark.SharkCliDriver.main(SharkCliDriver.scala)

srowen · 2014-04-30T18:27:10Z

@CodingCat I can make a patch, but it will mean introducing a new profile like "hadoop230" that one has to enable when building for Hadoop 2.3.0. I always hate to add that complexity and hope someone has a better idea. But I'll propose the PR if a committer nods and says it's worth changing.

I imagine it won't be the last time the dependencies have to be fudged by Hadoop version -- isn't this already an existing issue with Avro anyway?

darose · 2014-04-30T19:48:52Z

FYI - I think I might have figured out why deleting the jets3t jar didn't fix the issue. It looks like the spark build process bundles the jets3t classes into the spark assembly jar. So I'm guessing that whacking the stand-alone jar file wouldn't fix the issue if there's still 0.7 classes bundled in another jar.

darose · 2014-05-02T12:46:45Z

Man oh man, I cannot get this to work no way no how. I tried rebuilding spark using the jets3t 0.9 jar, then tried rebuilding shark doing the same. I keep getting a verify error - presumably because something in the call stack isn't compatible with the new jets3t version. Anyone have any ideas/suggestions? I'm at my wits' end on this. Spent days, and still unable to get a working version of spark/shark running with CDH5. Output below.

14/05/02 06:34:14 WARN scheduler.TaskSetManager: Loss was due to java.lang.VerifyError
java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.initialize(Ljava/net/URI;Lorg/apache/hadoop/conf/Configuration;)V @38: invokespecial
  Reason:
    Type 'org/jets3t/service/security/AWSCredentials' (current frame, stack[3]) is not assignable to 'org/jets3t/service/security/ProviderCredentials
'
  Current Frame:
    bci: @38
    flags: { }
    locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', 'java/net/URI', 'org/apache/hadoop/conf/Configuration', 'org/apache/hadoop
/fs/s3/S3Credentials', 'org/jets3t/service/security/AWSCredentials' }
    stack: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', uninitialized 32, uninitialized 32, 'org/jets3t/service/security/AWSCredent
ials' }
  Bytecode:
    0000000: bb00 0259 b700 034e 2d2b 2cb6 0004 bb00
    0000010: 0559 2db6 0006 2db6 0007 b700 083a 042a
    0000020: bb00 0959 1904 b700 0ab5 000b a700 0b3a
    0000030: 042a 1904 b700 0d2a 2c12 0e03 b600 0fb5
    0000040: 0010 2a2c 1211 1400 12b6 0014 1400 15b8
    0000050: 0017 b500 182a 2c12 1914 0015 b600 1414
    0000060: 0015 b800 17b5 001a 2abb 001b 592b b600
    0000070: 1cb7 001d b500 1eb1                    
  Exception Handler Table:
    bci [14, 44] => handler: 47
  Stackmap Table:                                                                                                                          [344/1956]
    full_frame(@47,{Object[#176],Object[#177],Object[#178],Object[#179]},{Object[#180]})
    same_frame(@55)

        at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280)
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:107)
        at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
        at org.apache.spark.scheduler.Task.run(Task.scala:53)
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

darose · 2014-05-02T19:41:14Z

I think I'm going to have to give up on getting Shark working on my existing CDH5 cluster right now. I've tried everything I can think of (various binary releases, building both spark and shark myself against jets3t 0.9, various config tweaks, etc.) but I'm stuck at either the class not found error in https://issues.apache.org/jira/browse/SPARK-1556, or the verify error above. I'll have to either wait until there's a new binary release, or look for an alternative.

pwendell · 2014-05-03T04:33:18Z

@srowen I'd prefer not to remove it from the dependency graph if possible because it will break local builds. The best solution I see is to add a profile for Hadoop 2.3 and 2.4. For now I'd be fine to just require users to manually trigger it and document this in building-with-maven. In SBT we can actually just insert logic in the build based on the Hadoop profile. I'm guessing we'll have to get into the habit of doing this, since it seems like Spark is good at finding bugs in Hadoop's dependency graph. We should probably start testing Spark against Hadoop RC's if they publish them to maven so we can give feedback.

I don't quite understand why the hadoop-client library doesn't advertise jets3 specifically... if I write a Java application that opens an S3 FileSystem and reads and writes data, don't I need jets3 to do that (i.e if this is outside a MapReduce job)? Is this just a bug hadoop's dependencies?

pwendell · 2014-05-03T04:44:15Z

@srowen if you'd like to take a crack at this by the way, please do. I'll probably look at it on Sunday if no one else has.

srowen · 2014-05-03T11:43:13Z

@pwendell Before I begin can I propose a refactoring of profiles that will make this and similar issues easy to deal with? Probably it's for a different PR, but will probably make this and similar changes easy.

We need profiles to deal with this. Profiles can be triggered explicitly (e.g. -Phadoop-2.3) or by property values (-Dhadoop.version=2.3.0). It's necessary to have things like hadoop.version be customizable, so it would be nice to also trigger needed profiles from this. However, Maven lacks ability to trigger on a range of property values; you can trigger on a particular value like "2.3.0" but not "2.3.*" or "[2.3.0,2.4.0)" syntax.

So it seems necessary to use a series of named profiles. Those profiles can set default version values, and those versions can be overridden. For example, it's nice to have a hadoop-2.3 profile set hadoop.version=2.3.0 for you, even though that can still be overridden.

(The SBT build can shadow these changes.)

After reading over the build and docs, I propose the following:

Introduce a hadoop-2.3 profile, similar to hadoop-0.23, to encompass 2.3+-specific build changes, and one for hadoop-2.2 as well (see later)
hadoop.major.version appears to be unused -- remove it?
I believe yarn.version can be removed; use hadoop.version in its place. Ideally these are always synced, no? All doc examples show yarn.version matching hadoop.version and the distribution script uses SPARK_HADOOP_VERSION for yarn.version. Now, the default Hadoop version is 1.0.4 and there is no such YARN version. But the yarn-alpha profile sets hadoop.version=0.23.7 to match the default yarn.version=0.23.7 anyway. It seems like Hadoop 1.x + YARN is not intended anyway, which seems corroborated by the build documentation.
So, YARN-related profiles should not set hadoop.version, and in fact only serve to add the yarn child module

... and then the fix for this issue is trivial.

All of the build permutations listed in the documentation work under this new arrangement. Anyone want to see a PR or have objections?

witgo · 2014-05-03T16:37:11Z

@srowen Not every one uses the same version of HDFS vs YARN.

srowen · 2014-05-03T16:43:59Z

@witgo Hm, is there an example that comes up repeatedly? Is it ever intentional, or just some accident of someone's legacy deployment? I don't know of a case of this, and it wouldn't come up with a distro or any semi-recent release of Hadoop, but maybe someone will say this comes up with the 1.x / 0.23.x lines somehow?

witgo · 2014-05-03T16:54:30Z

@srowen Related discussion in PR 502.
@berngp Can you explain the reason of not using the same version of HDFS vs YARN ?

berngp · 2014-05-03T18:13:06Z

I think in general is an edge case but there are folks still using hdfs
1.0.x with a different version of YARN, that said it is not my case.

I like what you suggested in another PR where you reused the variable value
of the hadoop.version to specify the yarn.version. Eg

<yarn.version>$hadoop.version</yarn.version>

Let me know if I should associate the small commits to specific PRs. Thanks
again for following up on those commits.

On Saturday, May 3, 2014, Guoqiang Li [email protected] wrote:

@srowen https://github.com/srowen Related discussion in PR 502#502
.
@berngp https://github.com/berngp Can you explain the reason of not
using the same version of HDFS vs YARN ?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/468#issuecomment-42110042
.

pwendell · 2014-05-03T18:54:56Z

@srowen YARN version does need to be separate from hadoop version. Downstream consumers of our build sometimes do this. For instance, if they want to build against a custom HDFS distro (e.g. pivotal, IBM, or something) but want to link against the upstream apache YARN repo. It's not something we do in binaries we distribute but it would be good to support it.

Think it's fine to remove hadoop.major.version - it seems unused.

Adding fancy profile activation would also be nice, but I think that it's not necessary as an immediate fix. We can just say in the build doc that "you need special profiles for the following hadoop versions" and give a small table or list explaining which profiles to activate.

…ions See related discussion at #468 This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`. - Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows. - Removes `hadoop.major.version` - Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes: - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue - like the jets3t version issue now - Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden - _(YARN profiles in the parent now only exist to add the sub-module)_ - Fixes the jets3t dependency issue - and makes it a runtime dependency - and centralizes config of this guy in the parent pom - Updates build docs - Updates SBT build too - and fixes a regex problem along the way Author: Sean Owen <[email protected]> Closes #629 from srowen/SPARK-1556 and squashes the following commits: c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles 274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build) f21f356 [Sean Owen] Build changes to set up for jets3t fix (cherry picked from commit 73b0cbc) Signed-off-by: Patrick Wendell <[email protected]>

…ions See related discussion at #468 This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`. - Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows. - Removes `hadoop.major.version` - Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes: - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue - like the jets3t version issue now - Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden - _(YARN profiles in the parent now only exist to add the sub-module)_ - Fixes the jets3t dependency issue - and makes it a runtime dependency - and centralizes config of this guy in the parent pom - Updates build docs - Updates SBT build too - and fixes a regex problem along the way Author: Sean Owen <[email protected]> Closes #629 from srowen/SPARK-1556 and squashes the following commits: c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles 274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build) f21f356 [Sean Owen] Build changes to set up for jets3t fix

CodingCat · 2014-05-05T23:32:01Z

fixed in #629

…ions See related discussion at apache#468 This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`. - Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows. - Removes `hadoop.major.version` - Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes: - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue - like the jets3t version issue now - Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden - _(YARN profiles in the parent now only exist to add the sub-module)_ - Fixes the jets3t dependency issue - and makes it a runtime dependency - and centralizes config of this guy in the parent pom - Updates build docs - Updates SBT build too - and fixes a regex problem along the way Author: Sean Owen <[email protected]> Closes apache#629 from srowen/SPARK-1556 and squashes the following commits: c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles 274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build) f21f356 [Sean Owen] Build changes to set up for jets3t fix

LuqmanSahaf · 2015-04-27T06:47:05Z

@darose I am facing the VerifyError you mentioned in one of the comments. Can you tell me how you solved that error?

mag- · 2015-04-27T09:49:23Z

Are you aware that all this regexp hacks will break when hadoop changes version to 3.0.0?

srowen · 2015-04-27T11:10:30Z

@mag- if you're talking about what I think you are, it was a temporary thing that's long since gone already https://github.com/apache/spark/pull/629/files

mag- · 2015-04-27T13:03:30Z

Well:
val jets3tVersion = if ("^2\\.[3-9]+".r.findFirstIn(hadoopVersion).isDefined) "0.9.0" else "0.7.1"
It probably should be other way round, if hadoop version is lower than 2.3 we use 0.7.1
Also someone needs to test it with hadoop 2.6/2.7 where s3 support was splitted to hadoop-aws.
( I'm thinking that mvn profile approach was maybe cleaner than this if/else... )

srowen · 2015-04-27T13:05:47Z

Agree but that doesn't exist in master anyway. Now the SBT build drives off the Maven build.

darose · 2015-04-28T02:23:01Z

On 04/27/2015 07:11 AM, Sean Owen wrote:

@mag- if you're talking about what I think you are, it was a temporary thing that's long since gone already https://github.com/apache/spark/pull/629/files

I think @srowen is correct. A while back I upgraded to use a newer
version of Spark (and built it using the correct -Dhadoop.version= and
-Phadoop- flags) and the problem went away.

DR

Allow passing env variables for conda so that we can enable instrumentation/other flags when required.

…e#468)

SPARK-1556: bump jet3st version to 0.9.0

398dbab

append mvn modification

775b19c

CodingCat changed the title ~~SPARK-1556: bump jet3st version to 0.9.0~~ SPARK-1556: bump jets3t version to 0.9.0 Apr 21, 2014

tell user to manually update the build file

3d6e824

srowen mentioned this pull request May 4, 2014

SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions #629

Closed

CodingCat closed this May 5, 2014

j-esse pushed a commit to j-esse/spark that referenced this pull request Jan 24, 2019

Allow configuring conda env variables (apache#468)

d172c9c

Allow passing env variables for conda so that we can enable instrumentation/other flags when required.

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

MapR [SPARK-492] Spark 2.4.0.0 configure.sh has error messages (apach…

abbae72

…e#468)

SPARK-1556: bump jets3t version to 0.9.0 #468

SPARK-1556: bump jets3t version to 0.9.0 #468

Conversation

CodingCat commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

AmplabJenkins commented Apr 21, 2014

mateiz commented Apr 22, 2014

srowen commented Apr 22, 2014

mateiz commented Apr 22, 2014

srowen commented Apr 22, 2014

mateiz commented Apr 22, 2014

CodingCat commented Apr 22, 2014

mateiz commented Apr 22, 2014

CodingCat commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

CodingCat commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

darose commented Apr 29, 2014

CodingCat commented Apr 29, 2014

mateiz commented Apr 30, 2014

CodingCat commented Apr 30, 2014

mateiz commented Apr 30, 2014

darose commented Apr 30, 2014

CodingCat commented Apr 30, 2014

darose commented Apr 30, 2014

srowen commented Apr 30, 2014

darose commented Apr 30, 2014

darose commented May 2, 2014

darose commented May 2, 2014

pwendell commented May 3, 2014

pwendell commented May 3, 2014

srowen commented May 3, 2014

witgo commented May 3, 2014

srowen commented May 3, 2014

witgo commented May 3, 2014

berngp commented May 3, 2014

pwendell commented May 3, 2014

CodingCat commented May 5, 2014

LuqmanSahaf commented Apr 27, 2015

mag- commented Apr 27, 2015

srowen commented Apr 27, 2015

mag- commented Apr 27, 2015

srowen commented Apr 27, 2015

darose commented Apr 28, 2015