-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing jar files #100
Comments
You can try this: git clone https://github.com/epfldata/lms.git Now you should be able to generate code in dbtoaster-backend. If not, let me know. Note: You can generate Spark code only for TPC-H queries. |
Thank you very much. For a simplified TPCH query that I created (joins lineitem, ), I got this error: One other issue that is confusing me. For the generated file, in the readme, step 4 is: Compile the generated Spark program for the target execution environment. |
This version does not support custom queries over the TPC-H schema. I have just updated the frontend to allow that.
You should be able to generate code for custom TPC-H queries.
To compile generated code, you would need to include Spark jar files and DBToaster runtime jar files. The latter you can find in the distribution under |
Thank you very much, Milos for your reply. I tried to create a conf directory (similar to the one found in: dbtoaster-backend/ddbtoaster/spark/conf ) and add configuration file after changing these paths, and then package it into a jar. However, it seems that the configuration parameters are set somewhere else. I thought I would ask before I change the generated code to hard code where the conf file is located. |
See ddbtoaster/spark/conf/spark.config for various spark configuration parameters. |
Yes, I created a similar configuration file in my spark project and I changed the values of the configuration parameters (the paths) to where my data and outputs should be. However, for some reason, when I execute the jar file that I create it is still assuming that the values are similar to the defaults in spark.config. So I assumed that this is also set in one of the dbtoaster libraries I am including when I packaged my jar file. |
Hi, |
Hard to tell, you could try reshuffling your input such that each partition is non-empty. |
Thanks Milos for your reply. I understand that the data are just the csv files generated by the TPCH data generator. Or am I missing something? |
Yes, the input is standard TPCH files. If I remember correctly, the code expects that the input is randomly distributed across all nodes to avoid data skew. This would explain your error -- might happen that your data is stored in one partition and others are empty. I would suggest running rdd.repartition after loading your input. |
I am storing the data in HDFS, so it should be partitioned among the nodes of the cluster. I will try the rdd.repartition. |
Hi,
I am trying to generate spark code. However, it returns an error:
[info] Loading project definition from /home/ec2-user/dbtoaster/dbtoaster-backend/project
[info] Set current project to dbtoaster (in build file:/home/ec2-user/dbtoaster/dbtoaster-backend/)
[info] Updating {file:/home/ec2-user/dbtoaster/dbtoaster-backend/}lms...
[info] Resolving EPFL#lms_2.11;0.3-SNAPSHOT ...
[warn] module not found: EPFL#lms_2.11;0.3-SNAPSHOT
[warn] ==== local: tried
[warn] /home/ec2-user/.ivy2/local/EPFL/lms_2.11/0.3-SNAPSHOT/ivys/ivy.xml
[warn] ==== local-preloaded-ivy: tried
[warn] /home/ec2-user/.sbt/preloaded/EPFL/lms_2.11/0.3-SNAPSHOT/ivys/ivy.xml
[warn] ==== local-preloaded: tried
[warn] file:////home/ec2-user/.sbt/preloaded/EPFL/lms_2.11/0.3-SNAPSHOT/lms_2.11-0.3-SNAPSHOT.pom
[warn] ==== public: tried
[warn] https://repo1.maven.org/maven2/EPFL/lms_2.11/0.3-SNAPSHOT/lms_2.11-0.3-SNAPSHOT.pom
[warn] ==== sonatype-snapshots: tried
[warn] https://oss.sonatype.org/content/repositories/snapshots/EPFL/lms_2.11/0.3-SNAPSHOT/lms_2.11-0.3-SNAPSHOT.pom
[info] Resolving jline#jline;2.12 ...
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: EPFL#lms_2.11;0.3-SNAPSHOT: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn] Note: Unresolved dependencies path:
[warn] EPFL:lms_2.11:0.3-SNAPSHOT (/home/ec2-user/dbtoaster/dbtoaster-backend/ddbtoaster/lms/build.sbt#L10-36)
[warn] +- ch.epfl.data:dbtoaster-lms_2.11:3.0
sbt.ResolveException: unresolved dependency: EPFL#lms_2.11;0.3-SNAPSHOT: not found
So it cannot resolve the dependency added to build.sbt ""EPFL" %% "lms" % "0.3-SNAPSHOT""
How can I fix this?
P.S. when I run the test units, they also fail with the same error.
The text was updated successfully, but these errors were encountered: