Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19019][PYTHON] Fix hijacked collections.namedtuple and port cloudpickle changes for PySpark to work with Python 3.6.0 #16429

Closed
wants to merge 2 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Dec 29, 2016

What changes were proposed in this pull request?

Currently, PySpark does not work with Python 3.6.0.

Running ./bin/pyspark simply throws the error as below and PySpark does not work at all:

Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in <module>
    from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in <module>
    import pkgutil
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in <module>
    ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'

The root cause seems because some arguments of namedtuple are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).

We currently copy this function via types.FunctionType which does not set the default values of keyword-only arguments (meaning namedtuple.__kwdefaults__) and this seems causing internally missing values in the function (non-bound arguments).

This PR proposes to work around this by manually setting it via kwargs as types.FunctionType seems not supporting to set this.

Also, this PR ports the changes in cloudpickle for compatibility for Python 3.6.0.

How was this patch tested?

Manually tested with Python 2.7.6 and Python 3.6.0.

./bin/pyspsark

, manual creation of namedtuple both in local and rdd with Python 3.6.0,

and Jenkins tests for other Python versions.

Also,

./run-tests --python-executables=python3.6
Will test against the following Python executables: ['python3.6']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Finished test(python3.6): pyspark.sql.tests (192s)
Finished test(python3.6): pyspark.accumulators (3s)
Finished test(python3.6): pyspark.mllib.tests (198s)
Finished test(python3.6): pyspark.broadcast (3s)
Finished test(python3.6): pyspark.conf (2s)
Finished test(python3.6): pyspark.context (14s)
Finished test(python3.6): pyspark.ml.classification (21s)
Finished test(python3.6): pyspark.ml.evaluation (11s)
Finished test(python3.6): pyspark.ml.clustering (20s)
Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
Finished test(python3.6): pyspark.streaming.tests (240s)
Finished test(python3.6): pyspark.tests (240s)
Finished test(python3.6): pyspark.ml.recommendation (19s)
Finished test(python3.6): pyspark.ml.feature (36s)
Finished test(python3.6): pyspark.ml.regression (37s)
Finished test(python3.6): pyspark.ml.tuning (28s)
Finished test(python3.6): pyspark.mllib.classification (26s)
Finished test(python3.6): pyspark.mllib.evaluation (18s)
Finished test(python3.6): pyspark.mllib.clustering (44s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.feature (26s)
Finished test(python3.6): pyspark.mllib.fpm (23s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.ml.tests (92s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (25s)
Finished test(python3.6): pyspark.mllib.stat._statistics (15s)
Finished test(python3.6): pyspark.mllib.recommendation (24s)
Finished test(python3.6): pyspark.mllib.regression (26s)
Finished test(python3.6): pyspark.profiler (9s)
Finished test(python3.6): pyspark.mllib.tree (16s)
Finished test(python3.6): pyspark.shuffle (1s)
Finished test(python3.6): pyspark.mllib.util (18s)
Finished test(python3.6): pyspark.serializers (11s)
Finished test(python3.6): pyspark.rdd (20s)
Finished test(python3.6): pyspark.sql.conf (8s)
Finished test(python3.6): pyspark.sql.catalog (17s)
Finished test(python3.6): pyspark.sql.column (18s)
Finished test(python3.6): pyspark.sql.context (18s)
Finished test(python3.6): pyspark.sql.group (27s)
Finished test(python3.6): pyspark.sql.dataframe (33s)
Finished test(python3.6): pyspark.sql.functions (35s)
Finished test(python3.6): pyspark.sql.types (6s)
Finished test(python3.6): pyspark.sql.streaming (13s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.session (16s)
Finished test(python3.6): pyspark.sql.window (4s)
Finished test(python3.6): pyspark.sql.readwriter (35s)
Tests passed in 433 seconds

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Dec 29, 2016

cc @davies and @JoshRosen. I know both of you are insightful in this area. I am not too sure if this is in the right direction (or rather the best fix) as it seems not even fixed in some other Python thirdparty libraries about function serialization yet (but an issue seems open by its maintainer). Do you mind if I ask to take a look?

@SparkQA
Copy link

SparkQA commented Dec 29, 2016

Test build #70699 has finished for PR 16429 at commit fb04979.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-19019][PYTHON] Fix hijected collections.namedtuple to be serialized with keyword-only arguments [WIP][SPARK-19019][PYTHON] Fix hijected collections.namedtuple to be serialized with keyword-only arguments in Python 3.6.0 Dec 29, 2016
@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-19019][PYTHON] Fix hijected collections.namedtuple to be serialized with keyword-only arguments in Python 3.6.0 [WIP][SPARK-19019][PYTHON] Fix hijacked collections.namedtuple to be serialized with keyword-only arguments in Python 3.6.0 Dec 29, 2016
@SparkQA
Copy link

SparkQA commented Dec 29, 2016

Test build #70702 has finished for PR 16429 at commit f4c56c8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 29, 2016

Test build #70701 has finished for PR 16429 at commit 9ac9d01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 29, 2016

Test build #70705 has finished for PR 16429 at commit b688e89.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-19019][PYTHON] Fix hijacked collections.namedtuple to be serialized with keyword-only arguments in Python 3.6.0 [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple to be serialized with keyword-only arguments in Python 3.6.0 Dec 29, 2016
@HyukjinKwon HyukjinKwon changed the title [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple to be serialized with keyword-only arguments in Python 3.6.0 [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple to be serialized with keyword-only arguments to work PySpark with Python 3.6.0 Dec 29, 2016
@HyukjinKwon HyukjinKwon changed the title [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple to be serialized with keyword-only arguments to work PySpark with Python 3.6.0 [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple to be serialized with keyword-only arguments for PySpark to work with Python 3.6.0 Dec 29, 2016

def _copy_func(f):
return types.FunctionType(f.__code__, f.__globals__, f.__name__,
f.__defaults__, f.__closure__)

def _kwdefaults(f):
kargs = getattr(f, "__kwdefaults__", None)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__kwdefaults__ can be None or not existing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put this comment into code?

@HyukjinKwon
Copy link
Member Author

gentle ping..

@azmras
Copy link

azmras commented Jan 3, 2017

after applying patch can you try to run
sc.parallelize(range(100), 8)
and confirm that it is working, because for me it is not...
and serialisation of objects goes crazy..

had to go back to python 3.5.2 for now..

Thanks for your efforts

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 3, 2017

Thanks for your interests @azmras. I just checked it as below:

sc.parallelize(range(100), 8)
Traceback (most recent call last):
  File ".../spark/python/pyspark/cloudpickle.py", line 107, in dump
    return Pickler.dump(self, obj)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File ".../spark/python/pyspark/cloudpickle.py", line 214, in save_function
    self.save_function_tuple(obj)
  File ".../spark/python/pyspark/cloudpickle.py", line 244, in save_function_tuple
    code, f_globals, defaults, closure, dct, base_globals = self.extract_func_data(func)
  File ".../spark/python/pyspark/cloudpickle.py", line 306, in extract_func_data
    func_global_refs = self.extract_code_globals(code)
  File ".../spark/python/pyspark/cloudpickle.py", line 288, in extract_code_globals
    out_names.add(names[oparg])
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../spark/python/pyspark/rdd.py", line 198, in __repr__
    return self._jrdd.toString()
  File ".../spark/python/pyspark/rdd.py", line 2438, in _jrdd
    self._jrdd_deserializer, profiler)
  File ".../spark/python/pyspark/rdd.py", line 2371, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File ".../spark/python/pyspark/rdd.py", line 2357, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File ".../spark/python/pyspark/serializers.py", line 452, in dumps
    return cloudpickle.dumps(obj, 2)
  File ".../spark/python/pyspark/cloudpickle.py", line 667, in dumps
    cp.dump(obj)
  File ".../spark/python/pyspark/cloudpickle.py", line 115, in dump
    if "'i' format requires" in e.message:
AttributeError: 'IndexError' object has no attribute 'message'

It looks another issue with Python 3.6.0 which seems related with cloudpickle module.

We should port cloudpipe/cloudpickle@cbd3f34

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 3, 2017

Hi @JoshRosen and @davies, do you think that should be ported in this PR? I am worried of making this PR harder to review by porting it here.

@HyukjinKwon
Copy link
Member Author

Hi @azmras, now it should work fine for your case as well.

@HyukjinKwon HyukjinKwon changed the title [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple to be serialized with keyword-only arguments for PySpark to work with Python 3.6.0 [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple and port cloudpickle for PySpark to work with Python 3.6.0 Jan 3, 2017
@HyukjinKwon HyukjinKwon changed the title [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple and port cloudpickle for PySpark to work with Python 3.6.0 [SPARK-19019][PYTHON] Fix hijacked collections.namedtuple and port cloudpickle changes for PySpark to work with Python 3.6.0 Jan 3, 2017
@SparkQA
Copy link

SparkQA commented Jan 3, 2017

Test build #70812 has finished for PR 16429 at commit 7b96546.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@azmras
Copy link

azmras commented Jan 3, 2017

Thanks.. it worked...
while you are at it...
sc.parallelize(range(1000), 20).take(5)
is problemetic..

the original problem is back when you do some action on RDD
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'

Will be thankful if you can look into it..

@HyukjinKwon
Copy link
Member Author

@azmras Could you maybe double check? It works okay in my local as below:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0-SNAPSHOT
      /_/

Using Python version 3.6.0 (default, Dec 24 2016 00:01:50)
SparkSession available as 'spark'.
>>> sc.parallelize(range(100), 8).take(5)
[0, 1, 2, 3, 4]
>>> sc.parallelize(range(1000), 20).take(5)
[0, 1, 2, 3, 4]
>>>

@azmras
Copy link

azmras commented Jan 4, 2017

Spark version 2.1.0
Using Python version 3.6.0 (default, Dec 24 2016 08:01:42)
SparkSession available as 'spark'.

sc.parallelize(range(1000), 20).take(5)
[0, 1, 2, 3, 4]

Thanks a lot it is working now.. had to patch zipped lib too.

@HyukjinKwon
Copy link
Member Author

@azmras Thank you for confirming this.

@azmras
Copy link

azmras commented Jan 4, 2017

just checked other things, ml, sql etc... everything is looking fine... I can safely say goodbye to python 3.5 now...

Thank you.

@nbys
Copy link

nbys commented Jan 4, 2017

After I applied your patch I get this error:

Traceback (most recent call last):
  File "/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark/cloudpickle.py", line 107, in dump
    return Pickler.dump(self, obj)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark/cloudpickle.py", line 214, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark/cloudpickle.py", line 251, in save_function_tuple
    save((code, closure, base_globals))
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 781, in save_list
    self._batch_appends(obj)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 805, in _batch_appends
    save(x)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark/cloudpickle.py", line 214, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark/cloudpickle.py", line 244, in save_function_tuple
    code, f_globals, defaults, closure, dct, base_globals = self.extract_func_data(func)
  File "/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark/cloudpickle.py", line 306, in extract_func_data
    func_global_refs = self.extract_code_globals(code)
  File "/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark/cloudpickle.py", line 288, in extract_code_globals
    out_names.add(names[oparg])
IndexError: tuple index out of range

Could you please take a look?

Regards,
Nikolay

@HyukjinKwon
Copy link
Member Author

Could you maybe check if this patch is applied properly? That error is exactly what this PR fixes and it seems the line number in the errors is not matched to the one in this PR.

@azmras
Copy link

azmras commented Jan 5, 2017

@cxww107
Try to update both patched files in the following locations
/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark
/usr/local/Cellar/apache-spark/2.1.0/libexec/python/lib/pyspark.zip
See if it works..

Thanks

@HyukjinKwon
Copy link
Member Author

ping @JoshRosen and @davies

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 15, 2017

Thank you @davies. The only added comments are as blow in the rebased commits:

# __kwdefaults__ contains the default values of keyword-only arguments which are
# introduced from Python 3. The possible cases for __kwdefaults__ in namedtuple
# are as below:
#
# - Does not exist in Python 2.
# - Returns None in <= Python 3.5.x.
# - Returns a dictionary containing the default values to the keys from Python 3.6.x
#    (See https://bugs.python.org/issue25628).

@HyukjinKwon
Copy link
Member Author

(I re-ran ./run-tests --python-executables=python3.6 for sure)

@SparkQA
Copy link

SparkQA commented Jan 15, 2017

Test build #71396 has finished for PR 16429 at commit 6458d41.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 15, 2017

Test build #71397 has finished for PR 16429 at commit 6458d41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

@davies, Could this be merged by any change maybe?

@davies
Copy link
Contributor

davies commented Jan 17, 2017

lgtm, merging into master and 2.1 branch.

asfgit pushed a commit that referenced this pull request Jan 17, 2017
… cloudpickle changes for PySpark to work with Python 3.6.0

## What changes were proposed in this pull request?

Currently, PySpark does not work with Python 3.6.0.

Running `./bin/pyspark` simply throws the error as below and PySpark does not work at all:

```
Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in <module>
    from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in <module>
    import pkgutil
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in <module>
    ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
```

The root cause seems because some arguments of `namedtuple` are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).

We currently copy this function via `types.FunctionType` which does not set the default values of keyword-only arguments (meaning `namedtuple.__kwdefaults__`) and this seems causing internally missing values in the function (non-bound arguments).

This PR proposes to work around this by manually setting it via `kwargs` as `types.FunctionType` seems not supporting to set this.

Also, this PR ports the changes in cloudpickle for compatibility for Python 3.6.0.

## How was this patch tested?

Manually tested with Python 2.7.6 and Python 3.6.0.

```
./bin/pyspsark
```

, manual creation of `namedtuple` both in local and rdd with Python 3.6.0,

and Jenkins tests for other Python versions.

Also,

```
./run-tests --python-executables=python3.6
```

```
Will test against the following Python executables: ['python3.6']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Finished test(python3.6): pyspark.sql.tests (192s)
Finished test(python3.6): pyspark.accumulators (3s)
Finished test(python3.6): pyspark.mllib.tests (198s)
Finished test(python3.6): pyspark.broadcast (3s)
Finished test(python3.6): pyspark.conf (2s)
Finished test(python3.6): pyspark.context (14s)
Finished test(python3.6): pyspark.ml.classification (21s)
Finished test(python3.6): pyspark.ml.evaluation (11s)
Finished test(python3.6): pyspark.ml.clustering (20s)
Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
Finished test(python3.6): pyspark.streaming.tests (240s)
Finished test(python3.6): pyspark.tests (240s)
Finished test(python3.6): pyspark.ml.recommendation (19s)
Finished test(python3.6): pyspark.ml.feature (36s)
Finished test(python3.6): pyspark.ml.regression (37s)
Finished test(python3.6): pyspark.ml.tuning (28s)
Finished test(python3.6): pyspark.mllib.classification (26s)
Finished test(python3.6): pyspark.mllib.evaluation (18s)
Finished test(python3.6): pyspark.mllib.clustering (44s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.feature (26s)
Finished test(python3.6): pyspark.mllib.fpm (23s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.ml.tests (92s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (25s)
Finished test(python3.6): pyspark.mllib.stat._statistics (15s)
Finished test(python3.6): pyspark.mllib.recommendation (24s)
Finished test(python3.6): pyspark.mllib.regression (26s)
Finished test(python3.6): pyspark.profiler (9s)
Finished test(python3.6): pyspark.mllib.tree (16s)
Finished test(python3.6): pyspark.shuffle (1s)
Finished test(python3.6): pyspark.mllib.util (18s)
Finished test(python3.6): pyspark.serializers (11s)
Finished test(python3.6): pyspark.rdd (20s)
Finished test(python3.6): pyspark.sql.conf (8s)
Finished test(python3.6): pyspark.sql.catalog (17s)
Finished test(python3.6): pyspark.sql.column (18s)
Finished test(python3.6): pyspark.sql.context (18s)
Finished test(python3.6): pyspark.sql.group (27s)
Finished test(python3.6): pyspark.sql.dataframe (33s)
Finished test(python3.6): pyspark.sql.functions (35s)
Finished test(python3.6): pyspark.sql.types (6s)
Finished test(python3.6): pyspark.sql.streaming (13s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.session (16s)
Finished test(python3.6): pyspark.sql.window (4s)
Finished test(python3.6): pyspark.sql.readwriter (35s)
Tests passed in 433 seconds
```

Author: hyukjinkwon <[email protected]>

Closes #16429 from HyukjinKwon/SPARK-19019.

(cherry picked from commit 20e6280)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in 20e6280 Jan 17, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
… cloudpickle changes for PySpark to work with Python 3.6.0

## What changes were proposed in this pull request?

Currently, PySpark does not work with Python 3.6.0.

Running `./bin/pyspark` simply throws the error as below and PySpark does not work at all:

```
Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in <module>
    from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in <module>
    import pkgutil
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in <module>
    ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
```

The root cause seems because some arguments of `namedtuple` are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).

We currently copy this function via `types.FunctionType` which does not set the default values of keyword-only arguments (meaning `namedtuple.__kwdefaults__`) and this seems causing internally missing values in the function (non-bound arguments).

This PR proposes to work around this by manually setting it via `kwargs` as `types.FunctionType` seems not supporting to set this.

Also, this PR ports the changes in cloudpickle for compatibility for Python 3.6.0.

## How was this patch tested?

Manually tested with Python 2.7.6 and Python 3.6.0.

```
./bin/pyspsark
```

, manual creation of `namedtuple` both in local and rdd with Python 3.6.0,

and Jenkins tests for other Python versions.

Also,

```
./run-tests --python-executables=python3.6
```

```
Will test against the following Python executables: ['python3.6']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Finished test(python3.6): pyspark.sql.tests (192s)
Finished test(python3.6): pyspark.accumulators (3s)
Finished test(python3.6): pyspark.mllib.tests (198s)
Finished test(python3.6): pyspark.broadcast (3s)
Finished test(python3.6): pyspark.conf (2s)
Finished test(python3.6): pyspark.context (14s)
Finished test(python3.6): pyspark.ml.classification (21s)
Finished test(python3.6): pyspark.ml.evaluation (11s)
Finished test(python3.6): pyspark.ml.clustering (20s)
Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
Finished test(python3.6): pyspark.streaming.tests (240s)
Finished test(python3.6): pyspark.tests (240s)
Finished test(python3.6): pyspark.ml.recommendation (19s)
Finished test(python3.6): pyspark.ml.feature (36s)
Finished test(python3.6): pyspark.ml.regression (37s)
Finished test(python3.6): pyspark.ml.tuning (28s)
Finished test(python3.6): pyspark.mllib.classification (26s)
Finished test(python3.6): pyspark.mllib.evaluation (18s)
Finished test(python3.6): pyspark.mllib.clustering (44s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.feature (26s)
Finished test(python3.6): pyspark.mllib.fpm (23s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.ml.tests (92s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (25s)
Finished test(python3.6): pyspark.mllib.stat._statistics (15s)
Finished test(python3.6): pyspark.mllib.recommendation (24s)
Finished test(python3.6): pyspark.mllib.regression (26s)
Finished test(python3.6): pyspark.profiler (9s)
Finished test(python3.6): pyspark.mllib.tree (16s)
Finished test(python3.6): pyspark.shuffle (1s)
Finished test(python3.6): pyspark.mllib.util (18s)
Finished test(python3.6): pyspark.serializers (11s)
Finished test(python3.6): pyspark.rdd (20s)
Finished test(python3.6): pyspark.sql.conf (8s)
Finished test(python3.6): pyspark.sql.catalog (17s)
Finished test(python3.6): pyspark.sql.column (18s)
Finished test(python3.6): pyspark.sql.context (18s)
Finished test(python3.6): pyspark.sql.group (27s)
Finished test(python3.6): pyspark.sql.dataframe (33s)
Finished test(python3.6): pyspark.sql.functions (35s)
Finished test(python3.6): pyspark.sql.types (6s)
Finished test(python3.6): pyspark.sql.streaming (13s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.session (16s)
Finished test(python3.6): pyspark.sql.window (4s)
Finished test(python3.6): pyspark.sql.readwriter (35s)
Tests passed in 433 seconds
```

Author: hyukjinkwon <[email protected]>

Closes apache#16429 from HyukjinKwon/SPARK-19019.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
… cloudpickle changes for PySpark to work with Python 3.6.0

## What changes were proposed in this pull request?

Currently, PySpark does not work with Python 3.6.0.

Running `./bin/pyspark` simply throws the error as below and PySpark does not work at all:

```
Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in <module>
    from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in <module>
    import pkgutil
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in <module>
    ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
```

The root cause seems because some arguments of `namedtuple` are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).

We currently copy this function via `types.FunctionType` which does not set the default values of keyword-only arguments (meaning `namedtuple.__kwdefaults__`) and this seems causing internally missing values in the function (non-bound arguments).

This PR proposes to work around this by manually setting it via `kwargs` as `types.FunctionType` seems not supporting to set this.

Also, this PR ports the changes in cloudpickle for compatibility for Python 3.6.0.

## How was this patch tested?

Manually tested with Python 2.7.6 and Python 3.6.0.

```
./bin/pyspsark
```

, manual creation of `namedtuple` both in local and rdd with Python 3.6.0,

and Jenkins tests for other Python versions.

Also,

```
./run-tests --python-executables=python3.6
```

```
Will test against the following Python executables: ['python3.6']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Finished test(python3.6): pyspark.sql.tests (192s)
Finished test(python3.6): pyspark.accumulators (3s)
Finished test(python3.6): pyspark.mllib.tests (198s)
Finished test(python3.6): pyspark.broadcast (3s)
Finished test(python3.6): pyspark.conf (2s)
Finished test(python3.6): pyspark.context (14s)
Finished test(python3.6): pyspark.ml.classification (21s)
Finished test(python3.6): pyspark.ml.evaluation (11s)
Finished test(python3.6): pyspark.ml.clustering (20s)
Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
Finished test(python3.6): pyspark.streaming.tests (240s)
Finished test(python3.6): pyspark.tests (240s)
Finished test(python3.6): pyspark.ml.recommendation (19s)
Finished test(python3.6): pyspark.ml.feature (36s)
Finished test(python3.6): pyspark.ml.regression (37s)
Finished test(python3.6): pyspark.ml.tuning (28s)
Finished test(python3.6): pyspark.mllib.classification (26s)
Finished test(python3.6): pyspark.mllib.evaluation (18s)
Finished test(python3.6): pyspark.mllib.clustering (44s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.feature (26s)
Finished test(python3.6): pyspark.mllib.fpm (23s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.ml.tests (92s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (25s)
Finished test(python3.6): pyspark.mllib.stat._statistics (15s)
Finished test(python3.6): pyspark.mllib.recommendation (24s)
Finished test(python3.6): pyspark.mllib.regression (26s)
Finished test(python3.6): pyspark.profiler (9s)
Finished test(python3.6): pyspark.mllib.tree (16s)
Finished test(python3.6): pyspark.shuffle (1s)
Finished test(python3.6): pyspark.mllib.util (18s)
Finished test(python3.6): pyspark.serializers (11s)
Finished test(python3.6): pyspark.rdd (20s)
Finished test(python3.6): pyspark.sql.conf (8s)
Finished test(python3.6): pyspark.sql.catalog (17s)
Finished test(python3.6): pyspark.sql.column (18s)
Finished test(python3.6): pyspark.sql.context (18s)
Finished test(python3.6): pyspark.sql.group (27s)
Finished test(python3.6): pyspark.sql.dataframe (33s)
Finished test(python3.6): pyspark.sql.functions (35s)
Finished test(python3.6): pyspark.sql.types (6s)
Finished test(python3.6): pyspark.sql.streaming (13s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.session (16s)
Finished test(python3.6): pyspark.sql.window (4s)
Finished test(python3.6): pyspark.sql.readwriter (35s)
Tests passed in 433 seconds
```

Author: hyukjinkwon <[email protected]>

Closes apache#16429 from HyukjinKwon/SPARK-19019.
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Mar 21, 2017
… cloudpickle changes for PySpark to work with Python 3.6.0

## What changes were proposed in this pull request?

Currently, PySpark does not work with Python 3.6.0.

Running `./bin/pyspark` simply throws the error as below and PySpark does not work at all:

```
Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in <module>
    from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in <module>
    import pkgutil
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in <module>
    ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
```

The root cause seems because some arguments of `namedtuple` are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).

We currently copy this function via `types.FunctionType` which does not set the default values of keyword-only arguments (meaning `namedtuple.__kwdefaults__`) and this seems causing internally missing values in the function (non-bound arguments).

This PR proposes to work around this by manually setting it via `kwargs` as `types.FunctionType` seems not supporting to set this.

Also, this PR ports the changes in cloudpickle for compatibility for Python 3.6.0.

## How was this patch tested?

Manually tested with Python 2.7.6 and Python 3.6.0.

```
./bin/pyspsark
```

, manual creation of `namedtuple` both in local and rdd with Python 3.6.0,

and Jenkins tests for other Python versions.

Also,

```
./run-tests --python-executables=python3.6
```

```
Will test against the following Python executables: ['python3.6']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Finished test(python3.6): pyspark.sql.tests (192s)
Finished test(python3.6): pyspark.accumulators (3s)
Finished test(python3.6): pyspark.mllib.tests (198s)
Finished test(python3.6): pyspark.broadcast (3s)
Finished test(python3.6): pyspark.conf (2s)
Finished test(python3.6): pyspark.context (14s)
Finished test(python3.6): pyspark.ml.classification (21s)
Finished test(python3.6): pyspark.ml.evaluation (11s)
Finished test(python3.6): pyspark.ml.clustering (20s)
Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
Finished test(python3.6): pyspark.streaming.tests (240s)
Finished test(python3.6): pyspark.tests (240s)
Finished test(python3.6): pyspark.ml.recommendation (19s)
Finished test(python3.6): pyspark.ml.feature (36s)
Finished test(python3.6): pyspark.ml.regression (37s)
Finished test(python3.6): pyspark.ml.tuning (28s)
Finished test(python3.6): pyspark.mllib.classification (26s)
Finished test(python3.6): pyspark.mllib.evaluation (18s)
Finished test(python3.6): pyspark.mllib.clustering (44s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.feature (26s)
Finished test(python3.6): pyspark.mllib.fpm (23s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.ml.tests (92s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (25s)
Finished test(python3.6): pyspark.mllib.stat._statistics (15s)
Finished test(python3.6): pyspark.mllib.recommendation (24s)
Finished test(python3.6): pyspark.mllib.regression (26s)
Finished test(python3.6): pyspark.profiler (9s)
Finished test(python3.6): pyspark.mllib.tree (16s)
Finished test(python3.6): pyspark.shuffle (1s)
Finished test(python3.6): pyspark.mllib.util (18s)
Finished test(python3.6): pyspark.serializers (11s)
Finished test(python3.6): pyspark.rdd (20s)
Finished test(python3.6): pyspark.sql.conf (8s)
Finished test(python3.6): pyspark.sql.catalog (17s)
Finished test(python3.6): pyspark.sql.column (18s)
Finished test(python3.6): pyspark.sql.context (18s)
Finished test(python3.6): pyspark.sql.group (27s)
Finished test(python3.6): pyspark.sql.dataframe (33s)
Finished test(python3.6): pyspark.sql.functions (35s)
Finished test(python3.6): pyspark.sql.types (6s)
Finished test(python3.6): pyspark.sql.streaming (13s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.session (16s)
Finished test(python3.6): pyspark.sql.window (4s)
Finished test(python3.6): pyspark.sql.readwriter (35s)
Tests passed in 433 seconds
```

Author: hyukjinkwon <[email protected]>

Closes apache#16429 from HyukjinKwon/SPARK-19019.
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Mar 21, 2017
… cloudpickle changes for PySpark to work with Python 3.6.0

## What changes were proposed in this pull request?

Currently, PySpark does not work with Python 3.6.0.

Running `./bin/pyspark` simply throws the error as below and PySpark does not work at all:

```
Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in <module>
    from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in <module>
    import pkgutil
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in <module>
    ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
```

The root cause seems because some arguments of `namedtuple` are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).

We currently copy this function via `types.FunctionType` which does not set the default values of keyword-only arguments (meaning `namedtuple.__kwdefaults__`) and this seems causing internally missing values in the function (non-bound arguments).

This PR proposes to work around this by manually setting it via `kwargs` as `types.FunctionType` seems not supporting to set this.

Also, this PR ports the changes in cloudpickle for compatibility for Python 3.6.0.

## How was this patch tested?

Manually tested with Python 2.7.6 and Python 3.6.0.

```
./bin/pyspsark
```

, manual creation of `namedtuple` both in local and rdd with Python 3.6.0,

and Jenkins tests for other Python versions.

Also,

```
./run-tests --python-executables=python3.6
```

```
Will test against the following Python executables: ['python3.6']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Finished test(python3.6): pyspark.sql.tests (192s)
Finished test(python3.6): pyspark.accumulators (3s)
Finished test(python3.6): pyspark.mllib.tests (198s)
Finished test(python3.6): pyspark.broadcast (3s)
Finished test(python3.6): pyspark.conf (2s)
Finished test(python3.6): pyspark.context (14s)
Finished test(python3.6): pyspark.ml.classification (21s)
Finished test(python3.6): pyspark.ml.evaluation (11s)
Finished test(python3.6): pyspark.ml.clustering (20s)
Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
Finished test(python3.6): pyspark.streaming.tests (240s)
Finished test(python3.6): pyspark.tests (240s)
Finished test(python3.6): pyspark.ml.recommendation (19s)
Finished test(python3.6): pyspark.ml.feature (36s)
Finished test(python3.6): pyspark.ml.regression (37s)
Finished test(python3.6): pyspark.ml.tuning (28s)
Finished test(python3.6): pyspark.mllib.classification (26s)
Finished test(python3.6): pyspark.mllib.evaluation (18s)
Finished test(python3.6): pyspark.mllib.clustering (44s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.feature (26s)
Finished test(python3.6): pyspark.mllib.fpm (23s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.ml.tests (92s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (25s)
Finished test(python3.6): pyspark.mllib.stat._statistics (15s)
Finished test(python3.6): pyspark.mllib.recommendation (24s)
Finished test(python3.6): pyspark.mllib.regression (26s)
Finished test(python3.6): pyspark.profiler (9s)
Finished test(python3.6): pyspark.mllib.tree (16s)
Finished test(python3.6): pyspark.shuffle (1s)
Finished test(python3.6): pyspark.mllib.util (18s)
Finished test(python3.6): pyspark.serializers (11s)
Finished test(python3.6): pyspark.rdd (20s)
Finished test(python3.6): pyspark.sql.conf (8s)
Finished test(python3.6): pyspark.sql.catalog (17s)
Finished test(python3.6): pyspark.sql.column (18s)
Finished test(python3.6): pyspark.sql.context (18s)
Finished test(python3.6): pyspark.sql.group (27s)
Finished test(python3.6): pyspark.sql.dataframe (33s)
Finished test(python3.6): pyspark.sql.functions (35s)
Finished test(python3.6): pyspark.sql.types (6s)
Finished test(python3.6): pyspark.sql.streaming (13s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.session (16s)
Finished test(python3.6): pyspark.sql.window (4s)
Finished test(python3.6): pyspark.sql.readwriter (35s)
Tests passed in 433 seconds
```

Author: hyukjinkwon <[email protected]>

Closes apache#16429 from HyukjinKwon/SPARK-19019.
asfgit pushed a commit that referenced this pull request Apr 17, 2017
…e` and port cloudpickle changes for PySpark to work with Python 3.6.0

## What changes were proposed in this pull request?

This PR proposes to backports #16429 to branch-1.6 so that Python 3.6.0 works with Spark 1.6.x.

## How was this patch tested?

Manually, via

```
./run-tests --python-executables=python3.6
```

```
Finished test(python3.6): pyspark.conf (5s)
Finished test(python3.6): pyspark.broadcast (7s)
Finished test(python3.6): pyspark.accumulators (9s)
Finished test(python3.6): pyspark.rdd (16s)
Finished test(python3.6): pyspark.shuffle (0s)
Finished test(python3.6): pyspark.serializers (11s)
Finished test(python3.6): pyspark.profiler (5s)
Finished test(python3.6): pyspark.context (21s)
Finished test(python3.6): pyspark.ml.clustering (12s)
Finished test(python3.6): pyspark.ml.feature (16s)
Finished test(python3.6): pyspark.ml.classification (16s)
Finished test(python3.6): pyspark.ml.recommendation (16s)
Finished test(python3.6): pyspark.ml.tuning (14s)
Finished test(python3.6): pyspark.ml.regression (16s)
Finished test(python3.6): pyspark.ml.evaluation (12s)
Finished test(python3.6): pyspark.ml.tests (17s)
Finished test(python3.6): pyspark.mllib.classification (18s)
Finished test(python3.6): pyspark.mllib.evaluation (12s)
Finished test(python3.6): pyspark.mllib.feature (19s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.fpm (12s)
Finished test(python3.6): pyspark.mllib.clustering (31s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (17s)
Finished test(python3.6): pyspark.mllib.recommendation (23s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.stat._statistics (13s)
Finished test(python3.6): pyspark.mllib.regression (22s)
Finished test(python3.6): pyspark.mllib.util (9s)
Finished test(python3.6): pyspark.mllib.tree (14s)
Finished test(python3.6): pyspark.sql.types (9s)
Finished test(python3.6): pyspark.sql.context (16s)
Finished test(python3.6): pyspark.sql.column (14s)
Finished test(python3.6): pyspark.sql.group (16s)
Finished test(python3.6): pyspark.sql.dataframe (25s)
Finished test(python3.6): pyspark.tests (164s)
Finished test(python3.6): pyspark.sql.window (6s)
Finished test(python3.6): pyspark.sql.functions (19s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.readwriter (24s)
Finished test(python3.6): pyspark.sql.tests (38s)
Finished test(python3.6): pyspark.mllib.tests (133s)
Finished test(python3.6): pyspark.streaming.tests (189s)
Tests passed in 380 seconds
```

Author: hyukjinkwon <[email protected]>

Closes #17375 from HyukjinKwon/SPARK-19019-backport-1.6.
asfgit pushed a commit that referenced this pull request Apr 17, 2017
…e` and port cloudpickle changes for PySpark to work with Python 3.6.0

## What changes were proposed in this pull request?

This PR proposes to backports #16429 to branch-2.0 so that Python 3.6.0 works with Spark 2.0.x.

## How was this patch tested?

Manually, via

```
./run-tests --python-executables=python3.6
```

```
Finished test(python3.6): pyspark.tests (124s)
Finished test(python3.6): pyspark.accumulators (4s)
Finished test(python3.6): pyspark.broadcast (4s)
Finished test(python3.6): pyspark.conf (3s)
Finished test(python3.6): pyspark.context (15s)
Finished test(python3.6): pyspark.ml.classification (24s)
Finished test(python3.6): pyspark.sql.tests (190s)
Finished test(python3.6): pyspark.mllib.tests (190s)
Finished test(python3.6): pyspark.ml.clustering (14s)
Finished test(python3.6): pyspark.ml.linalg.__init__ (0s)
Finished test(python3.6): pyspark.ml.recommendation (18s)
Finished test(python3.6): pyspark.ml.feature (28s)
Finished test(python3.6): pyspark.ml.evaluation (28s)
Finished test(python3.6): pyspark.ml.regression (21s)
Finished test(python3.6): pyspark.ml.tuning (17s)
Finished test(python3.6): pyspark.streaming.tests (239s)
Finished test(python3.6): pyspark.mllib.evaluation (15s)
Finished test(python3.6): pyspark.mllib.classification (24s)
Finished test(python3.6): pyspark.mllib.clustering (37s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.fpm (19s)
Finished test(python3.6): pyspark.mllib.feature (19s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.ml.tests (76s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.recommendation (21s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (27s)
Finished test(python3.6): pyspark.mllib.regression (22s)
Finished test(python3.6): pyspark.mllib.stat._statistics (11s)
Finished test(python3.6): pyspark.mllib.tree (16s)
Finished test(python3.6): pyspark.profiler (8s)
Finished test(python3.6): pyspark.shuffle (1s)
Finished test(python3.6): pyspark.mllib.util (17s)
Finished test(python3.6): pyspark.serializers (12s)
Finished test(python3.6): pyspark.rdd (18s)
Finished test(python3.6): pyspark.sql.conf (4s)
Finished test(python3.6): pyspark.sql.catalog (14s)
Finished test(python3.6): pyspark.sql.column (13s)
Finished test(python3.6): pyspark.sql.context (15s)
Finished test(python3.6): pyspark.sql.group (26s)
Finished test(python3.6): pyspark.sql.dataframe (31s)
Finished test(python3.6): pyspark.sql.functions (32s)
Finished test(python3.6): pyspark.sql.types (5s)
Finished test(python3.6): pyspark.sql.streaming (11s)
Finished test(python3.6): pyspark.sql.window (5s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.session (15s)
Finished test(python3.6): pyspark.sql.readwriter (34s)
Tests passed in 376 seconds
```

Author: hyukjinkwon <[email protected]>

Closes #17374 from HyukjinKwon/SPARK-19019-backport.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 18, 2017
…e` and port cloudpickle changes for PySpark to work with Python 3.6.0

## What changes were proposed in this pull request?

This PR proposes to backports apache#16429 to branch-1.6 so that Python 3.6.0 works with Spark 1.6.x.

## How was this patch tested?

Manually, via

```
./run-tests --python-executables=python3.6
```

```
Finished test(python3.6): pyspark.conf (5s)
Finished test(python3.6): pyspark.broadcast (7s)
Finished test(python3.6): pyspark.accumulators (9s)
Finished test(python3.6): pyspark.rdd (16s)
Finished test(python3.6): pyspark.shuffle (0s)
Finished test(python3.6): pyspark.serializers (11s)
Finished test(python3.6): pyspark.profiler (5s)
Finished test(python3.6): pyspark.context (21s)
Finished test(python3.6): pyspark.ml.clustering (12s)
Finished test(python3.6): pyspark.ml.feature (16s)
Finished test(python3.6): pyspark.ml.classification (16s)
Finished test(python3.6): pyspark.ml.recommendation (16s)
Finished test(python3.6): pyspark.ml.tuning (14s)
Finished test(python3.6): pyspark.ml.regression (16s)
Finished test(python3.6): pyspark.ml.evaluation (12s)
Finished test(python3.6): pyspark.ml.tests (17s)
Finished test(python3.6): pyspark.mllib.classification (18s)
Finished test(python3.6): pyspark.mllib.evaluation (12s)
Finished test(python3.6): pyspark.mllib.feature (19s)
Finished test(python3.6): pyspark.mllib.linalg.__init__ (0s)
Finished test(python3.6): pyspark.mllib.fpm (12s)
Finished test(python3.6): pyspark.mllib.clustering (31s)
Finished test(python3.6): pyspark.mllib.random (8s)
Finished test(python3.6): pyspark.mllib.linalg.distributed (17s)
Finished test(python3.6): pyspark.mllib.recommendation (23s)
Finished test(python3.6): pyspark.mllib.stat.KernelDensity (0s)
Finished test(python3.6): pyspark.mllib.stat._statistics (13s)
Finished test(python3.6): pyspark.mllib.regression (22s)
Finished test(python3.6): pyspark.mllib.util (9s)
Finished test(python3.6): pyspark.mllib.tree (14s)
Finished test(python3.6): pyspark.sql.types (9s)
Finished test(python3.6): pyspark.sql.context (16s)
Finished test(python3.6): pyspark.sql.column (14s)
Finished test(python3.6): pyspark.sql.group (16s)
Finished test(python3.6): pyspark.sql.dataframe (25s)
Finished test(python3.6): pyspark.tests (164s)
Finished test(python3.6): pyspark.sql.window (6s)
Finished test(python3.6): pyspark.sql.functions (19s)
Finished test(python3.6): pyspark.streaming.util (0s)
Finished test(python3.6): pyspark.sql.readwriter (24s)
Finished test(python3.6): pyspark.sql.tests (38s)
Finished test(python3.6): pyspark.mllib.tests (133s)
Finished test(python3.6): pyspark.streaming.tests (189s)
Tests passed in 380 seconds
```

Author: hyukjinkwon <[email protected]>

Closes apache#17375 from HyukjinKwon/SPARK-19019-backport-1.6.

(cherry picked from commit 6b315f3)
ilovezfs pushed a commit to liudangyi/homebrew-core that referenced this pull request Apr 18, 2017
@HyukjinKwon HyukjinKwon deleted the SPARK-19019 branch January 2, 2018 03:44
@jamal119
Copy link

hi,HyukjinKwon,have you solved your problem,my error is exactly the same as yours,looking forward to your reply.

@HyukjinKwon
Copy link
Member Author

This is fixed from Spark 1.6.4, 2.0.3, 2.1.1 and 2.2.0.

@jamal119
Copy link

How did you end up solving that problem,reinstall the spark ? Which version did you install,Can you offer a final solution,thanks

@HyukjinKwon
Copy link
Member Author

Install Spark I mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants