Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: multiple bugs on spark-shell usage #57

Closed
wants to merge 4 commits into from

Conversation

bzz
Copy link
Contributor

@bzz bzz commented Sep 22, 2017

Submitting a fatJar after adding bblfsh-client results in runtime exceptions only when used \w spark-submit/spark-shell

java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
	at io.grpc.internal.ClientCallImpl.<init>(ClientCallImpl.java:104)
	at io.grpc.internal.ManagedChannelImpl$RealChannel.newCall(ManagedChannelImpl.java:554)
	at io.grpc.internal.ManagedChannelImpl.newCall(ManagedChannelImpl.java:533)
	at gopkg.in.bblfsh.sdk.v1.protocol.generated.ProtocolServiceGrpc$ProtocolServiceBlockingStub.parse(ProtocolServiceGrpc.scala:37)
	at org.bblfsh.client.BblfshClient.parse(BblfshClient.scala:23)

This PR includes:

  • update CI to run spark-shell --jar

  • fix Caused by: java.lang.IllegalArgumentException: Allocation size must be greater than zero on .classifyLanguages() better be fixed in Add failing test for empty file content enry#106 or elsewhere upstream

  • fix workaround Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor; on .extractUASTs()

    Which is result of having old version of Guava in runtime classpath of Spark, when gRPC needs a new one. Best of all worlds solution would be - gRPC re-shade guava in their build... Would it be possible to shade guava? grpc/grpc-java#2688 but that will take a while to get ScalaPB to adopt it.

    Another option could be re-shade it on our side in a fatJar build. This should be handled in sub-sequent PRs.

    There is a workaround, if we ship newer Guava AND a clients of SparkAPI set configuration properties in their apps (before creating in spark context i.e though spark-submit --properties-file)

     spark.driver.userClassPathFirst=true
    

    CI uses this workaround. It can be automated on the client side by putting the above $SPARK_HOME/conf/spark-defaults.conf

  • for some reason, spark.executor.userClassPathFirst=true results in 🐞

    Caused by: java.lang.ClassCastException: cannot assign instance of tech.sourced.api.provider.SivaRDDProvider$$anonfun$tech$sourced$api$provider$SivaRDDProvider$$generateRDD$1 to field org.apache.spark.rdd.RDD$$anonfun$map$1$$anonfun$apply$5.cleanF$1 of type scala.Function1 in instance of org.apache.spark.rdd.RDD$$anonfun$map$1$$anonfun$apply$5
    
  • workaround, using non-fat jar + --packages

@bzz bzz force-pushed the fix/spark-shell-guava branch 2 times, most recently from a8b2331 to c986354 Compare September 22, 2017 16:40
@bzz bzz force-pushed the fix/spark-shell-guava branch 2 times, most recently from 8320647 to 4d62688 Compare September 22, 2017 17:01
@bzz
Copy link
Contributor Author

bzz commented Sep 22, 2017

After much experiments I could not find a way to work around different Guava version at runtime, while submitting a fatJar.

The only way it works for me is - submitting a regular jar, and providing all the dependencies though --packages with setting spark.{driver, executor}.userClassPathFirst=true

Adding ./spark-shell wrapper that implements this approach for CI

@bzz bzz force-pushed the fix/spark-shell-guava branch 6 times, most recently from d9bb0c5 to b17756f Compare September 22, 2017 21:29
@codecov
Copy link

codecov bot commented Sep 22, 2017

Codecov Report

Merging #57 into master will increase coverage by 0.15%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #57      +/-   ##
============================================
+ Coverage     84.12%   84.28%   +0.15%     
  Complexity       40       40              
============================================
  Files            17       17              
  Lines           441      439       -2     
  Branches         76       76              
============================================
- Hits            371      370       -1     
  Misses           33       33              
+ Partials         37       36       -1
Impacted Files Coverage Δ Complexity Δ
...n/scala/tech/sourced/api/udf/ExtractUASTsUDF.scala 100% <ø> (+4.34%) 0 <0> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99ea6c0...9a8ed37. Read the comment docs.

@bzz bzz force-pushed the fix/spark-shell-guava branch from b17756f to 9a8ed37 Compare September 22, 2017 21:36
@bzz
Copy link
Contributor Author

bzz commented Sep 22, 2017

The workaround is ready to be merged at will.

A fix for submiting actual fatJar would be handled in subsequent PR

@erizocosmico
Copy link
Contributor

But even if we implement the spark-shell wrapper we still have the same problem in the python wrapper and using the library programmatically unless the user adds userClassPathFirst to properties file, right?

@ajnavarro
Copy link
Contributor

I think all the errors fixed here are already fixed on master branch. Close these PR if true. Thanks!

@bzz
Copy link
Contributor Author

bzz commented Oct 18, 2017

@ajnavarro great!

I might have missed it, but could no find in master analog of end to end tests from 0a904e0

@bzz bzz closed this Oct 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants