Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create sample command fails in Google Dataproc Spark 2.11.8 #163

Open
sanjayio opened this issue Jul 17, 2018 · 5 comments
Open

Create sample command fails in Google Dataproc Spark 2.11.8 #163

sanjayio opened this issue Jul 17, 2018 · 5 comments

Comments

@sanjayio
Copy link

sanjayio commented Jul 17, 2018

I am getting the error java.io.IOException: Mkdirs failed to create file:/home/sanjay/spark-warehouse/default_verdict.db/vt23_1/.hive-staging_hive_2018-07-17_03-03-28_842_6156432897141230125-1/-ext-10000/_temporary/0/_temporary/attempt_20180717030333_0002_m_000016_3 when I run the command vc.sql("create sample of default.advertiser_apr_orc").show(false).
I am running on Dataproc image 1.2, with spark 2.11.8 and verdict-spark-lib-0.4.8.jar. I am running this command as the root user and have done chmod 755 to the dir /home/sanjay/

@sanjayio
Copy link
Author

I even tried with the same configuration given in the documentation with Dataproc 1.0 image and verdict-core-0.3.0-jar-with-dependencies.jar. When I do the create sample command I am getting this error
org.apache.hadoop.hive.common.FileUtils: Creating directory if it doesn't exist: hdfs://cluster-16-m/user/hive/warehouse/null_verdict.db/vt66_4/.hive-staging_hive_2018-07-17_07-25-01_583_8460634515176170110-1 java.lang.NullPointerException at edu.umich.verdict.util.StringManipulations.quoteString(StringManipulations.java:132) at edu.umich.verdict.dbms.DbmsSpark.insertEntry(DbmsSpark.java:138) at edu.umich.verdict.dbms.Dbms.insertSampleNameEntryIntoDBMS(Dbms.java:480) at edu.umich.verdict.dbms.DbmsSpark.updateSampleNameEntryIntoDBMS(DbmsSpark.java:146) at edu.umich.verdict.VerdictMeta.insertSampleInfo(VerdictMeta.java:200) at edu.umich.verdict.query.CreateSampleQuery.createUniformRandomSample(CreateSampleQuery.java:120) at edu.umich.verdict.query.CreateSampleQuery.buildSamples(CreateSampleQuery.java:57) at edu.umich.verdict.query.CreateSampleQuery.buildSamples(CreateSampleQuery.java:81) at edu.umich.verdict.query.CreateSampleQuery.compute(CreateSampleQuery.java:39) at edu.umich.verdict.query.Query.computeDataFrame(Query.java:107) at edu.umich.verdict.VerdictSparkHiveContext.execute(VerdictSparkHiveContext.java:40) at edu.umich.verdict.VerdictContext.executeSparkQuery(VerdictContext.java:125) at edu.umich.verdict.VerdictContext.sql(VerdictContext.java:131) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42) at $iwC$$iwC$$iwC.<init>(<console>:44) at $iwC$$iwC.<init>(<console>:46) at $iwC.<init>(<console>:48) at <init>(<console>:50) at .<init>(<console>:54) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

@sanjayio
Copy link
Author

sanjayio commented Jul 17, 2018

I tried building the master branch ( verdict-spark-lib-0.4.11.jar ) and ran it on a fresh instance of google dataproc 1.2 version. Even in that instance when I run

import edu.umich.verdict.VerdictSpark2Context
scala> val vc = new VerdictSpark2Context(sc)
scala> vc.sql("show databases").show(false)
scala> vc.sql("create sample of default.advertiser_06_01_orc").show(false)

I am getting the following error
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Unable to create database path file:/home/sanjay/spark-warehouse/default_verdict.db, failed to create database default_verdict); at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) at org.apache.spark.sql.hive.HiveExternalCatalog.doCreateDatabase(HiveExternalCatalog.scala:163) at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createDatabase(ExternalCatalog.scala:69) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:219) at org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:66) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:183) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632) at edu.umich.verdict.dbms.DbmsSpark2.execute(DbmsSpark2.java:84) at edu.umich.verdict.dbms.DbmsSpark2.executeUpdate(DbmsSpark2.java:91) at edu.umich.verdict.dbms.Dbms.createCatalog(Dbms.java:192) at edu.umich.verdict.dbms.Dbms.createDatabase(Dbms.java:183) at edu.umich.verdict.query.CreateSampleQuery.buildSamples(CreateSampleQuery.java:93) at edu.umich.verdict.query.CreateSampleQuery.compute(CreateSampleQuery.java:64) at edu.umich.verdict.query.Query.computeDataset(Query.java:192) at edu.umich.verdict.VerdictSpark2Context.execute(VerdictSpark2Context.java:61) at edu.umich.verdict.VerdictContext.executeSpark2Query(VerdictContext.java:160) at edu.umich.verdict.VerdictSpark2Context.sql(VerdictSpark2Context.java:81)

What does this error mean?

@pyongjoo
Copy link
Member

pyongjoo commented Jul 17, 2018 via email

@sanjayio
Copy link
Author

@pyongjoo I think you are right. I am not able to create the schema as well. I am getting the same error when I try that. How can I resolve this issue?

@pyongjoo
Copy link
Member

pyongjoo commented Jul 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants