Skip to content

Commit

Permalink
[SPARK-26349][PYSPARK] Forbid insecure py4j gateways
Browse files Browse the repository at this point in the history
Spark always creates secure py4j connections between java and python,
but it also allows users to pass in their own connection. This ensures
that even passed in connections are secure.

Added test cases verifying the failure with a (mocked) insecure gateway.

This is closely related to SPARK-26019, but this entirely forbids the
insecure connection, rather than creating the "escape-hatch".

Closes #23441 from squito/SPARK-26349.

Authored-by: Imran Rashid <[email protected]>
Signed-off-by: Bryan Cutler <[email protected]>
  • Loading branch information
squito authored and BryanCutler committed Jan 8, 2019
1 parent e103c4a commit 32515d2
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 0 deletions.
5 changes: 5 additions & 0 deletions python/pyspark/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,11 @@ def __init__(self, master=None, appName=None, sparkHome=None, pyFiles=None,
ValueError:...
"""
self._callsite = first_spark_call() or CallSite(None, None, None)
if gateway is not None and gateway.gateway_parameters.auth_token is None:
raise ValueError(
"You are trying to pass an insecure Py4j gateway to Spark. This"
" is not allowed as it is a security risk.")

SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
try:
self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
Expand Down
10 changes: 10 additions & 0 deletions python/pyspark/tests/test_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import threading
import time
import unittest
from collections import namedtuple

from pyspark import SparkFiles, SparkContext
from pyspark.testing.utils import ReusedPySparkTestCase, PySparkTestCase, QuietTest, SPARK_HOME
Expand Down Expand Up @@ -246,6 +247,15 @@ def test_startTime(self):
with SparkContext() as sc:
self.assertGreater(sc.startTime, 0)

def test_forbid_insecure_gateway(self):
# Fail immediately if you try to create a SparkContext
# with an insecure gateway
parameters = namedtuple('MockGatewayParameters', 'auth_token')(None)
mock_insecure_gateway = namedtuple('MockJavaGateway', 'gateway_parameters')(parameters)
with self.assertRaises(ValueError) as context:
SparkContext(gateway=mock_insecure_gateway)
self.assertIn("insecure Py4j gateway", str(context.exception))


if __name__ == "__main__":
from pyspark.tests.test_context import *
Expand Down

0 comments on commit 32515d2

Please sign in to comment.