You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to run apollo on k8 staging cluster, so I wanted to test it out locally on minikube first. I used helm charts to bring up a local spark cluster, scylla DB and babelfshd. I then created an image for apollo, available here as well as a k8 service so it would connect to port 7077, 9042 and 9432. After creating the pod i ran the resetdbcommand, it worked. I cloned the engine repo in order to get example siva files, that I put in io/siva. Then I tried to run the bagscommand , Spark launches and registers the job (I checked logs on the master and worker pod, as well as UI) and then I got this error:
INFO:engine:Initializing on io/siva
INFO:MetadataSaver:Ignition -> DzhigurdaFiles -> UastExtractor -> Moder -> Cacher -> MetadataSaver
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 908, in send_command
response = connection.send_command(command)
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1067, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "/usr/local/bin/apollo", line 11, in <module>
load_entry_point('apollo', 'console_scripts', 'apollo')()
File "/packages/apollo/apollo/__main__.py", line 230, in main
return handler(args)
File "/packages/apollo/apollo/bags.py", line 94, in source2bags
cache_hook=lambda: MetadataSaver(args.keyspace, args.tables["meta"]))
File "/packages/sourced/ml/utils/engine.py", line 147, in wrapped_pause
return func(cmdline_args, *args, **kwargs)
File "/packages/sourced/ml/cmd_entries/repos2bow.py", line 35, in repos2bow_entry_template
uast_extractor.link(cache_hook()).execute()
File "/packages/sourced/ml/transformers/transformer.py", line 95, in execute
head = node(head)
File "/packages/apollo/apollo/bags.py", line 46, in __call__
rows.toDF() \
File "/spark/python/pyspark/sql/session.py", line 58, in toDF
return sparkSession.createDataFrame(self, schema, sampleRatio)
File "/spark/python/pyspark/sql/session.py", line 582, in createDataFrame
rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
File "/spark/python/pyspark/sql/session.py", line 380, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio)
File "/spark/python/pyspark/sql/session.py", line 351, in _inferSchema
first = rdd.first()
File "/spark/python/pyspark/rdd.py", line 1361, in first
rs = self.take(1)
File "/spark/python/pyspark/rdd.py", line 1343, in take
res = self.context.runJob(self, takeUpToNumLeft, p)
File "/spark/python/pyspark/context.py", line 992, in runJob
port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1160, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.5/dist-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
Open new tab and log in the spark master with kubectl exec -it anything-master /bin/bash then do: export PYSPARK_PYTHON=python3 and export PYSPARK_PYTHON_DRIVER=python3
Go to the previous tab, it should be logged on the apollo pod and run apollo resetdb --cassandra scylla:9042
Get the siva files: apt update, apt install git, git clone https://github.com/src-d/engine, mkdir io, mkdir io/bags, cp engine/examples/siva_files io/siva
And finally: apollo bags -r io/siva --bow io/bags/bow.asdf --docfreq io/bags/docfreq.asdf -f id -f lit -f uast2seq --uast2seq-seq-len 4 -l Java --min-docfreq 5 --bblfsh babel-bblfshd --cassandra scylla:9042 --persist MEMORY_ONLY -s spark://anything-master:7077
Any ideas ?
The text was updated successfully, but these errors were encountered:
Issue description
I want to run apollo on k8 staging cluster, so I wanted to test it out locally on minikube first. I used helm charts to bring up a local spark cluster, scylla DB and babelfshd. I then created an image for apollo, available here as well as a k8 service so it would connect to port 7077, 9042 and 9432. After creating the pod i ran the
resetdb
command, it worked. I cloned the engine repo in order to get example siva files, that I put inio/siva
. Then I tried to run thebags
command , Spark launches and registers the job (I checked logs on the master and worker pod, as well as UI) and then I got this error:Steps to Reproduce (for bugs)
helm install scylla --name=scylla
,helm install spark --name=anything
,helm install bblfshd --name=babel
,kubectl create -f service.yaml
,kubectl run -ti --image=r0maink/apollo apollo-test
kubectl exec -it anything-master /bin/bash
then do:export PYSPARK_PYTHON=python3
andexport PYSPARK_PYTHON_DRIVER=python3
apollo resetdb --cassandra scylla:9042
apt update
,apt install git
,git clone https://github.com/src-d/engine
,mkdir io
,mkdir io/bags
,cp engine/examples/siva_files io/siva
And finally:
apollo bags -r io/siva --bow io/bags/bow.asdf --docfreq io/bags/docfreq.asdf -f id -f lit -f uast2seq --uast2seq-seq-len 4 -l Java --min-docfreq 5 --bblfsh babel-bblfshd --cassandra scylla:9042 --persist MEMORY_ONLY -s spark://anything-master:7077
Any ideas ?
The text was updated successfully, but these errors were encountered: