Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurred while trying to connect to the Java server #912

Closed
kaiseu opened this issue Jan 12, 2020 · 5 comments
Closed

An error occurred while trying to connect to the Java server #912

kaiseu opened this issue Jan 12, 2020 · 5 comments
Assignees

Comments

@kaiseu
Copy link

kaiseu commented Jan 12, 2020

when trying to run the object detection jupyter demo with the latest version under apps, below error occurs, can anybody help on this? Thanks!

ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:41803)
Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/pyspark/rdd.py", line 816, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File "/opt/work/spark-2.4.3/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 929, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1067, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:41803)
Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/pyspark/rdd.py", line 816, in collect
sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File "/opt/work/spark-2.4.3/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 929, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/work/spark-2.4.3/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1067, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused

@hkvision
Copy link
Contributor

Hi @kaiseu

Probably it is an issue due to memory? Are you using the video we provide? Could you try to expand driver memory and executor memory to have a try? Thanks.

@magic20191
Copy link

magic20191 commented Aug 19, 2020

I had the same problem with the document case
environment:
centos7,
memery:4G
pytorch1.4.0 cpu
zoo:0.9.0.dev0
ps:Virtual machine environment , With only one node, Zoo is PIP installed
image

import torch
import torch.nn as nn
from bigdl.optim.optimizer import Adam
from zoo.common.nncontext import *
from zoo.pipeline.api.net.torch_net import TorchNet
from zoo.pipeline.api.net.torch_criterion import TorchCriterion
from zoo.pipeline.nnframes import *
from pyspark.ml.linalg import Vectors
from pyspark.sql import SparkSession

class SimpleTorchModel(nn.Module):
def init(self):
super(SimpleTorchModel, self).init()
self.dense1 = nn.Linear(2, 4)
self.dense2 = nn.Linear(4, 1)
def forward(self, x):
x = self.dense1(x)
x = torch.sigmoid(self.dense2(x))
return x

if name == 'main':
sparkConf = init_spark_conf().setAppName("example_pytorch").setMaster('local[1]')
sc = init_nncontext(sparkConf)
spark = SparkSession
.builder
.getOrCreate()
df = spark.createDataFrame(
[(Vectors.dense([2.0, 1.0]), 1.0),
(Vectors.dense([1.0, 2.0]), 0.0),
(Vectors.dense([2.0, 1.0]), 1.0),
(Vectors.dense([1.0, 2.0]), 0.0)],
["features", "label"])
torch_model = SimpleTorchModel()
torch_criterion = nn.MSELoss()
az_model = TorchNet.from_pytorch(torch_model, [1, 2])
az_criterion = TorchCriterion.from_pytorch(torch_criterion, [1, 1], [1, 1])
classifier = NNClassifier(az_model, az_criterion)
.setBatchSize(4)
.setOptimMethod(Adam())
.setLearningRate(0.01)
.setMaxEpoch(10)
nnClassifierModel = classifier.fit(df)
print("After training: ")
res = nnClassifierModel.transform(df)
res.show(10, False)

pyspark_submit_args is: --driver-class-path /root/anaconda3/lib/python3.6/site-packages/bigdl/share/lib/bigdl-0.10.0-jar-with-dependencies.jar:/root/anaconda3/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.9.0-SNAPSHOT-jar-with-dependencies.jar pyspark-shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/anaconda3/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.9.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/anaconda3/lib/python3.6/site-packages/pyspark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-08-19 00:01:41 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

User settings:

KMP_AFFINITY=granularity=fine,compact,1,0
KMP_BLOCKTIME=0
KMP_SETTINGS=1
OMP_NUM_THREADS=1

Effective settings:

KMP_ABORT_DELAY=0
KMP_ADAPTIVE_LOCK_PROPS='1,1024'
KMP_ALIGN_ALLOC=64
KMP_ALL_THREADPRIVATE=128
KMP_ATOMIC_MODE=2
KMP_BLOCKTIME=0
KMP_CPUINFO_FILE: value is not defined
KMP_DETERMINISTIC_REDUCTION=false
KMP_DEVICE_THREAD_LIMIT=2147483647
KMP_DISP_HAND_THREAD=false
KMP_DISP_NUM_BUFFERS=7
KMP_DUPLICATE_LIB_OK=false
KMP_FORCE_REDUCTION: value is not defined
KMP_FOREIGN_THREADS_THREADPRIVATE=true
KMP_FORKJOIN_BARRIER='2,2'
KMP_FORKJOIN_BARRIER_PATTERN='hyper,hyper'
KMP_FORKJOIN_FRAMES=true
KMP_FORKJOIN_FRAMES_MODE=3
KMP_GTID_MODE=3
KMP_HANDLE_SIGNALS=false
KMP_HOT_TEAMS_MAX_LEVEL=1
KMP_HOT_TEAMS_MODE=0
KMP_INIT_AT_FORK=true
KMP_INIT_WAIT=2048
KMP_ITT_PREPARE_DELAY=0
KMP_LIBRARY=throughput
KMP_LOCK_KIND=queuing
KMP_MALLOC_POOL_INCR=1M
KMP_NEXT_WAIT=1024
KMP_NUM_LOCKS_IN_BLOCK=1
KMP_PLAIN_BARRIER='2,2'
KMP_PLAIN_BARRIER_PATTERN='hyper,hyper'
KMP_REDUCTION_BARRIER='1,1'
KMP_REDUCTION_BARRIER_PATTERN='hyper,hyper'
KMP_SCHEDULE='static,balanced;guided,iterative'
KMP_SETTINGS=true
KMP_SPIN_BACKOFF_PARAMS='4096,100'
KMP_STACKOFFSET=64
KMP_STACKPAD=0
KMP_STACKSIZE=4M
KMP_STORAGE_MAP=false
KMP_TASKING=2
KMP_TASKLOOP_MIN_TASKS=0
KMP_TASK_STEALING_CONSTRAINT=1
KMP_TEAMS_THREAD_LIMIT=1
KMP_TOPOLOGY_METHOD=all
KMP_USER_LEVEL_MWAIT=false
KMP_VERSION=false
KMP_WARNINGS=true
OMP_AFFINITY_FORMAT='OMP: pid %P tid %T thread %n bound to OS proc set {%a}'
OMP_ALLOCATOR=omp_default_mem_alloc
OMP_CANCELLATION=false
OMP_DEFAULT_DEVICE=0
OMP_DISPLAY_AFFINITY=false
OMP_DISPLAY_ENV=false
OMP_DYNAMIC=false
OMP_MAX_ACTIVE_LEVELS=2147483647
OMP_MAX_TASK_PRIORITY=0
OMP_NESTED=false
OMP_NUM_THREADS='1'
OMP_PLACES: value is not defined
OMP_PROC_BIND='intel'
OMP_SCHEDULE='static'
OMP_STACKSIZE=4M
OMP_TARGET_OFFLOAD=DEFAULT
OMP_THREAD_LIMIT=2147483647
OMP_TOOL=enabled
OMP_TOOL_LIBRARIES: value is not defined
OMP_WAIT_POLICY=PASSIVE
KMP_AFFINITY='noverbose,warnings,respect,granularity=fine,compact,1,0'

cls.getname: com.intel.analytics.bigdl.python.api.Sample
BigDLBasePickler registering: bigdl.util.common Sample
cls.getname: com.intel.analytics.bigdl.python.api.EvaluatedResult
BigDLBasePickler registering: bigdl.util.common EvaluatedResult
cls.getname: com.intel.analytics.bigdl.python.api.JTensor
BigDLBasePickler registering: bigdl.util.common JTensor
cls.getname: com.intel.analytics.bigdl.python.api.JActivity
BigDLBasePickler registering: bigdl.util.common JActivity
creating: createTorchNet
creating: createTorchCriterion
creating: createSeqToTensor
creating: createScalarToTensor
creating: createFeatureLabelPreprocessing
creating: createNNClassifier
creating: createAdam
TorchNet loading in TorchNet[bed9d76]
loading libgomp-8bba0e50.so.1
loading libc10.so
loading libcaffe2.so
loading libtorch.so.1
loading libpytorch-engine.so
terminate called after throwing an instance of 'c10::Error'
what(): [enforce fail at inline_container.cc:173] . file not found: model/model.json
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f4bf508b441 in /tmp/dlNativeLoader2178169977227902467libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x49 (0x7f4bf508b259 in /tmp/dlNativeLoader2178169977227902467libc10.so)
frame #2: caffe2::serialize::PyTorchStreamReader::getFileID(std::string const&) + 0x52e (0x7f4bec339e5e in /tmp/dlNativeLoader1803792900704059554libcaffe2.so)
frame #3: caffe2::serialize::PyTorchStreamReader::getRecord(std::string const&) + 0x20 (0x7f4bec33a020 in /tmp/dlNativeLoader1803792900704059554libcaffe2.so)
frame #4: + 0xa7dc03 (0x7f4be9fdac03 in /tmp/dlNativeLoader5113198754580178383libtorch.so.1)
frame #5: torch::jit::load(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_deletecaffe2::serialize::ReadAdapterInterface >, c10::optionalc10::Device, std::unordered_map<std::string, std::string, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::string> > >&) + 0x10d (0x7f4be9fdd0cd in /tmp/dlNativeLoader5113198754580178383libtorch.so.1)
frame #6: torch::jit::load(std::string const&, c10::optionalc10::Device, std::unordered_map<std::string, std::string, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::string> > >&) + 0x68 (0x7f4be9fdd1f8 in /tmp/dlNativeLoader5113198754580178383libtorch.so.1)
frame #7: Java_com_intel_analytics_zoo_pipeline_api_net_PytorchModel_loadModelNative + 0x9c (0x7f4be923c85b in /tmp/dlNativeLoader4490405095027738473libpytorch-engine.so)
frame #8: [0x7f4c29018667]

ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/root/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "", line 22, in
File "/root/anaconda3/lib/python3.6/site-packages/pyspark/ml/base.py", line 132, in fit
return self._fit(dataset)
File "/root/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 295, in _fit
java_model = self._fit_java(dataset)
File "/root/anaconda3/lib/python3.6/site-packages/pyspark/ml/wrapper.py", line 292, in _fit_java
return self._java_obj.fit(dataset._jdf)
File "/root/anaconda3/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in call
answer, self.gateway_client, self.target_id, self.name)
File "/root/anaconda3/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/root/anaconda3/lib/python3.6/site-packages/py4j/protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling o84.fit

@hkvision
Copy link
Contributor

You need to enlarge your memory, refer to here to set a larger memory: https://analytics-zoo.github.io/master/#PythonUserGuide/run/#run-after-pip-install

@helenlly
Copy link

@kaiseu thanks for your question. pls let us know if any more questions or we may go ahead to close it.

@helenlly
Copy link

@kaiseu we 'll close the issue and you have re-open if need.thanks

@liu-shaojun liu-shaojun transferred this issue from intel-analytics/BigDL-2.x Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants