An error occurred while trying to connect to the Java server #912

kaiseu · 2020-01-12T07:56:55Z

when trying to run the object detection jupyter demo with the latest version under apps, below error occurs, can anybody help on this? Thanks!

ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:41803)
Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/pyspark/rdd.py", line 816, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File "/opt/work/spark-2.4.3/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 929, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1067, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:41803)
Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/pyspark/rdd.py", line 816, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File "/opt/work/spark-2.4.3/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 929, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1067, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused

hkvision · 2020-01-13T00:56:28Z

Hi @kaiseu

Probably it is an issue due to memory? Are you using the video we provide? Could you try to expand driver memory and executor memory to have a try? Thanks.

magic20191 · 2020-08-19T04:15:20Z

I had the same problem with the document case
environment:
centos7,
memery:4G
pytorch1.4.0 cpu
zoo:0.9.0.dev0
ps:Virtual machine environment , With only one node, Zoo is PIP installed

import torch
import torch.nn as nn
from bigdl.optim.optimizer import Adam
from zoo.common.nncontext import *
from zoo.pipeline.api.net.torch_net import TorchNet
from zoo.pipeline.api.net.torch_criterion import TorchCriterion
from zoo.pipeline.nnframes import *
from pyspark.ml.linalg import Vectors
from pyspark.sql import SparkSession

class SimpleTorchModel(nn.Module):
def init(self):
super(SimpleTorchModel, self).init()
self.dense1 = nn.Linear(2, 4)
self.dense2 = nn.Linear(4, 1)
def forward(self, x):
x = self.dense1(x)
x = torch.sigmoid(self.dense2(x))
return x

if name == 'main':
sparkConf = init_spark_conf().setAppName("example_pytorch").setMaster('local[1]')
sc = init_nncontext(sparkConf)
spark = SparkSession
.builder
.getOrCreate()
df = spark.createDataFrame(
[(Vectors.dense([2.0, 1.0]), 1.0),
(Vectors.dense([1.0, 2.0]), 0.0),
(Vectors.dense([2.0, 1.0]), 1.0),
(Vectors.dense([1.0, 2.0]), 0.0)],
["features", "label"])
torch_model = SimpleTorchModel()
torch_criterion = nn.MSELoss()
az_model = TorchNet.from_pytorch(torch_model, [1, 2])
az_criterion = TorchCriterion.from_pytorch(torch_criterion, [1, 1], [1, 1])
classifier = NNClassifier(az_model, az_criterion)
.setBatchSize(4)
.setOptimMethod(Adam())
.setLearningRate(0.01)
.setMaxEpoch(10)
nnClassifierModel = classifier.fit(df)
print("After training: ")
res = nnClassifierModel.transform(df)
res.show(10, False)

pyspark_submit_args is: --driver-class-path /root/anaconda3/lib/python3.6/site-packages/bigdl/share/lib/bigdl-0.10.0-jar-with-dependencies.jar:/root/anaconda3/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.9.0-SNAPSHOT-jar-with-dependencies.jar pyspark-shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/anaconda3/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.9.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/anaconda3/lib/python3.6/site-packages/pyspark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-08-19 00:01:41 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

User settings:

KMP_AFFINITY=granularity=fine,compact,1,0
KMP_BLOCKTIME=0
KMP_SETTINGS=1
OMP_NUM_THREADS=1

Effective settings:

KMP_ABORT_DELAY=0
KMP_ADAPTIVE_LOCK_PROPS='1,1024'
KMP_ALIGN_ALLOC=64
KMP_ALL_THREADPRIVATE=128
KMP_ATOMIC_MODE=2
KMP_BLOCKTIME=0
KMP_CPUINFO_FILE: value is not defined
KMP_DETERMINISTIC_REDUCTION=false
KMP_DEVICE_THREAD_LIMIT=2147483647
KMP_DISP_HAND_THREAD=false
KMP_DISP_NUM_BUFFERS=7
KMP_DUPLICATE_LIB_OK=false
KMP_FORCE_REDUCTION: value is not defined
KMP_FOREIGN_THREADS_THREADPRIVATE=true
KMP_FORKJOIN_BARRIER='2,2'
KMP_FORKJOIN_BARRIER_PATTERN='hyper,hyper'
KMP_FORKJOIN_FRAMES=true
KMP_FORKJOIN_FRAMES_MODE=3
KMP_GTID_MODE=3
KMP_HANDLE_SIGNALS=false
KMP_HOT_TEAMS_MAX_LEVEL=1
KMP_HOT_TEAMS_MODE=0
KMP_INIT_AT_FORK=true
KMP_INIT_WAIT=2048
KMP_ITT_PREPARE_DELAY=0
KMP_LIBRARY=throughput
KMP_LOCK_KIND=queuing
KMP_MALLOC_POOL_INCR=1M
KMP_NEXT_WAIT=1024
KMP_NUM_LOCKS_IN_BLOCK=1
KMP_PLAIN_BARRIER='2,2'
KMP_PLAIN_BARRIER_PATTERN='hyper,hyper'
KMP_REDUCTION_BARRIER='1,1'
KMP_REDUCTION_BARRIER_PATTERN='hyper,hyper'
KMP_SCHEDULE='static,balanced;guided,iterative'
KMP_SETTINGS=true
KMP_SPIN_BACKOFF_PARAMS='4096,100'
KMP_STACKOFFSET=64
KMP_STACKPAD=0
KMP_STACKSIZE=4M
KMP_STORAGE_MAP=false
KMP_TASKING=2
KMP_TASKLOOP_MIN_TASKS=0
KMP_TASK_STEALING_CONSTRAINT=1
KMP_TEAMS_THREAD_LIMIT=1
KMP_TOPOLOGY_METHOD=all
KMP_USER_LEVEL_MWAIT=false
KMP_VERSION=false
KMP_WARNINGS=true
OMP_AFFINITY_FORMAT='OMP: pid %P tid %T thread %n bound to OS proc set {%a}'
OMP_ALLOCATOR=omp_default_mem_alloc
OMP_CANCELLATION=false
OMP_DEFAULT_DEVICE=0
OMP_DISPLAY_AFFINITY=false
OMP_DISPLAY_ENV=false
OMP_DYNAMIC=false
OMP_MAX_ACTIVE_LEVELS=2147483647
OMP_MAX_TASK_PRIORITY=0
OMP_NESTED=false
OMP_NUM_THREADS='1'
OMP_PLACES: value is not defined
OMP_PROC_BIND='intel'
OMP_SCHEDULE='static'
OMP_STACKSIZE=4M
OMP_TARGET_OFFLOAD=DEFAULT
OMP_THREAD_LIMIT=2147483647
OMP_TOOL=enabled
OMP_TOOL_LIBRARIES: value is not defined
OMP_WAIT_POLICY=PASSIVE
KMP_AFFINITY='noverbose,warnings,respect,granularity=fine,compact,1,0'

cls.getname: com.intel.analytics.bigdl.python.api.Sample
BigDLBasePickler registering: bigdl.util.common Sample
cls.getname: com.intel.analytics.bigdl.python.api.EvaluatedResult
BigDLBasePickler registering: bigdl.util.common EvaluatedResult
cls.getname: com.intel.analytics.bigdl.python.api.JTensor
BigDLBasePickler registering: bigdl.util.common JTensor
cls.getname: com.intel.analytics.bigdl.python.api.JActivity
BigDLBasePickler registering: bigdl.util.common JActivity
creating: createTorchNet
creating: createTorchCriterion
creating: createSeqToTensor
creating: createScalarToTensor
creating: createFeatureLabelPreprocessing
creating: createNNClassifier
creating: createAdam
TorchNet loading in TorchNet[bed9d76]
loading libgomp-8bba0e50.so.1
loading libc10.so
loading libcaffe2.so
loading libtorch.so.1
loading libpytorch-engine.so
terminate called after throwing an instance of 'c10::Error'
what(): [enforce fail at inline_container.cc:173] . file not found: model/model.json
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f4bf508b441 in /tmp/dlNativeLoader2178169977227902467libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x49 (0x7f4bf508b259 in /tmp/dlNativeLoader2178169977227902467libc10.so)
frame #2: caffe2::serialize::PyTorchStreamReader::getFileID(std::string const&) + 0x52e (0x7f4bec339e5e in /tmp/dlNativeLoader1803792900704059554libcaffe2.so)
frame #3: caffe2::serialize::PyTorchStreamReader::getRecord(std::string const&) + 0x20 (0x7f4bec33a020 in /tmp/dlNativeLoader1803792900704059554libcaffe2.so)
frame #4: + 0xa7dc03 (0x7f4be9fdac03 in /tmp/dlNativeLoader5113198754580178383libtorch.so.1)
frame #5: torch::jit::load(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_deletecaffe2::serialize::ReadAdapterInterface >, c10::optionalc10::Device, std::unordered_map<std::string, std::string, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::string> > >&) + 0x10d (0x7f4be9fdd0cd in /tmp/dlNativeLoader5113198754580178383libtorch.so.1)
frame #6: torch::jit::load(std::string const&, c10::optionalc10::Device, std::unordered_map<std::string, std::string, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::string> > >&) + 0x68 (0x7f4be9fdd1f8 in /tmp/dlNativeLoader5113198754580178383libtorch.so.1)
frame #7: Java_com_intel_analytics_zoo_pipeline_api_net_PytorchModel_loadModelNative + 0x9c (0x7f4be923c85b in /tmp/dlNativeLoader4490405095027738473libpytorch-engine.so)
frame #8: [0x7f4c29018667]

ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/root/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "", line 22, in
File "/root/anaconda3/lib/python3.6/site-packages/pyspark/ml/base.py", line 132, in fit
return self._fit(dataset)
File "/root/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 295, in _fit
java_model = self._fit_java(dataset)
File "/root/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 292, in _fit_java
return self._java_obj.fit(dataset._jdf)
File "/root/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File "/root/anaconda3/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/root/anaconda3/lib/python3.6/site-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o84.fit

hkvision · 2020-08-20T03:11:07Z

You need to enlarge your memory, refer to here to set a larger memory: https://analytics-zoo.github.io/master/#PythonUserGuide/run/#run-after-pip-install

helenlly · 2020-09-15T06:53:24Z

@kaiseu thanks for your question. pls let us know if any more questions or we may go ahead to close it.

helenlly · 2021-01-18T01:59:50Z

@kaiseu we 'll close the issue and you have re-open if need.thanks

helenlly assigned helenlly and hkvision Sep 15, 2020

helenlly closed this as completed Jan 18, 2021

liu-shaojun transferred this issue from intel-analytics/BigDL-2.x Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An error occurred while trying to connect to the Java server #912

An error occurred while trying to connect to the Java server #912

kaiseu commented Jan 12, 2020

hkvision commented Jan 13, 2020

magic20191 commented Aug 19, 2020 •

edited

Loading

hkvision commented Aug 20, 2020

helenlly commented Sep 15, 2020

helenlly commented Jan 18, 2021

An error occurred while trying to connect to the Java server #912

An error occurred while trying to connect to the Java server #912

Comments

kaiseu commented Jan 12, 2020

hkvision commented Jan 13, 2020

magic20191 commented Aug 19, 2020 • edited Loading

hkvision commented Aug 20, 2020

helenlly commented Sep 15, 2020

helenlly commented Jan 18, 2021

magic20191 commented Aug 19, 2020 •

edited

Loading