Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run apollo on minikube: Answer from Java side is empty #30

Open
r0mainK opened this issue Mar 7, 2018 · 0 comments
Open

Run apollo on minikube: Answer from Java side is empty #30

r0mainK opened this issue Mar 7, 2018 · 0 comments

Comments

@r0mainK
Copy link
Contributor

r0mainK commented Mar 7, 2018

Issue description

I want to run apollo on k8 staging cluster, so I wanted to test it out locally on minikube first. I used helm charts to bring up a local spark cluster, scylla DB and babelfshd. I then created an image for apollo, available here as well as a k8 service so it would connect to port 7077, 9042 and 9432. After creating the pod i ran the resetdbcommand, it worked. I cloned the engine repo in order to get example siva files, that I put in io/siva. Then I tried to run the bagscommand , Spark launches and registers the job (I checked logs on the master and worker pod, as well as UI) and then I got this error:

INFO:engine:Initializing on io/siva
INFO:MetadataSaver:Ignition -> DzhigurdaFiles -> UastExtractor -> Moder -> Cacher -> MetadataSaver
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 908, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1067, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
  File "/usr/local/bin/apollo", line 11, in <module>
    load_entry_point('apollo', 'console_scripts', 'apollo')()
  File "/packages/apollo/apollo/__main__.py", line 230, in main
    return handler(args)
  File "/packages/apollo/apollo/bags.py", line 94, in source2bags
    cache_hook=lambda: MetadataSaver(args.keyspace, args.tables["meta"]))
  File "/packages/sourced/ml/utils/engine.py", line 147, in wrapped_pause
    return func(cmdline_args, *args, **kwargs)
  File "/packages/sourced/ml/cmd_entries/repos2bow.py", line 35, in repos2bow_entry_template
    uast_extractor.link(cache_hook()).execute()
  File "/packages/sourced/ml/transformers/transformer.py", line 95, in execute
    head = node(head)
  File "/packages/apollo/apollo/bags.py", line 46, in __call__
    rows.toDF() \
  File "/spark/python/pyspark/sql/session.py", line 58, in toDF
    return sparkSession.createDataFrame(self, schema, sampleRatio)
  File "/spark/python/pyspark/sql/session.py", line 582, in createDataFrame
    rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
  File "/spark/python/pyspark/sql/session.py", line 380, in _createFromRDD
    struct = self._inferSchema(rdd, samplingRatio)
  File "/spark/python/pyspark/sql/session.py", line 351, in _inferSchema
    first = rdd.first()
  File "/spark/python/pyspark/rdd.py", line 1361, in first
    rs = self.take(1)
  File "/spark/python/pyspark/rdd.py", line 1343, in take
    res = self.context.runJob(self, takeUpToNumLeft, p)
  File "/spark/python/pyspark/context.py", line 992, in runJob
    port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/local/lib/python3.5/dist-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

Steps to Reproduce (for bugs)

  • Setup minikube and helm
  • Clone charts repo
  • Create pods, servieces, etc: helm install scylla --name=scylla, helm install spark --name=anything, helm install bblfshd --name=babel, kubectl create -f service.yaml, kubectl run -ti --image=r0maink/apollo apollo-test
  • Open new tab and log in the spark master with kubectl exec -it anything-master /bin/bash then do: export PYSPARK_PYTHON=python3 and export PYSPARK_PYTHON_DRIVER=python3
  • Go to the previous tab, it should be logged on the apollo pod and run apollo resetdb --cassandra scylla:9042
  • Get the siva files: apt update, apt install git, git clone https://github.com/src-d/engine, mkdir io, mkdir io/bags, cp engine/examples/siva_files io/siva

And finally: apollo bags -r io/siva --bow io/bags/bow.asdf --docfreq io/bags/docfreq.asdf -f id -f lit -f uast2seq --uast2seq-seq-len 4 -l Java --min-docfreq 5 --bblfsh babel-bblfshd --cassandra scylla:9042 --persist MEMORY_ONLY -s spark://anything-master:7077

Any ideas ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant