Skip to content

Commit

Permalink
[SPARK-6667] [PySpark] remove setReuseAddress
Browse files Browse the repository at this point in the history
The reused address on server side had caused the server can not acknowledge the connected connections, remove it.

This PR will retry once after timeout, it also add a timeout at client side.

Author: Davies Liu <[email protected]>

Closes apache#5324 from davies/collect_hang and squashes the following commits:

e5a51a2 [Davies Liu] remove setReuseAddress
7977c2f [Davies Liu] do retry on client side
b838f35 [Davies Liu] retry after timeout
  • Loading branch information
Davies Liu authored and JoshRosen committed Apr 2, 2015
1 parent 424e987 commit 0cce545
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -605,7 +605,6 @@ private[spark] object PythonRDD extends Logging {
*/
private def serveIterator[T](items: Iterator[T], threadName: String): Int = {
val serverSocket = new ServerSocket(0, 1)
serverSocket.setReuseAddress(true)
// Close the socket if no connection in 3 seconds
serverSocket.setSoTimeout(3000)

Expand Down
1 change: 1 addition & 0 deletions python/pyspark/rdd.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ def _parse_memory(s):

def _load_from_socket(port, serializer):
sock = socket.socket()
sock.settimeout(3)
try:
sock.connect(("localhost", port))
rf = sock.makefile("rb", 65536)
Expand Down

0 comments on commit 0cce545

Please sign in to comment.