Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify docstring for Pyspark's foreachPartition #2895

Closed
wants to merge 1 commit into from

Conversation

tdhopper
Copy link

Due to the underlying use of mapPartitions which requires a function that maps partitions to partitions, foreachPartition requires the function passed to be a generator function or return an iterable (although these results are discarded).

This is currently not stated in the documentation except through the unexplained example. It would help users to understand that example and not waste time with this error:

TypeError: 'NoneType' object is not iterable

Due to the underlying use of `mapPartitions` which requires a function that maps partitions to partitions, `foreachPartition` requires the function passed to be a generator function or return an iterable (although these results are discarded). 

This is currently not stated in the documentation except through the unexplained example. It would help users to understand that example and not waste time with this error:

```
TypeError: 'NoneType' object is not iterable
```
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@JoshRosen
Copy link
Contributor

Actually, we might want to just fix this and allow foreachPartition and foreach to accept those types of UDFs, too.

@tdhopper
Copy link
Author

Oh. Now that I look at master, @JoshRosen, I see that it's already been fixed by @davis here. The fix just isn't in 1.1. I guess we should close this?

@JoshRosen
Copy link
Contributor

Maybe we can backport SPARK-2871 to 1.1, since it looks like it also fixes a bunch of preservesPartitioning bugs.

@JoshRosen
Copy link
Contributor

@davies Do you think we should backport #2093 to branch-1.1 in order to fix this issue?

@davies
Copy link
Contributor

davies commented Oct 23, 2014

@JoshRosen It will be better if we could easily backport them.

@tdhopper
Copy link
Author

I'd love to see this happen.

@davis
Copy link

davis commented Oct 23, 2014

Ah - tdhopper, i think you meant @davies :)

@JoshRosen
Copy link
Contributor

@davis I think someone mentioned you instead of @davies (an off-by-one-character error).

@JoshRosen
Copy link
Contributor

(Imagine what @Override's notification inbox must look like...)

@JoshRosen
Copy link
Contributor

If you don't mind, could you close this PR since it has been subsumed by another commit? If we want to track the progress / backport status of a different fix, then we should do that in JIRA.

@tdhopper
Copy link
Author

@JoshRosen: Yup. Thanks.

@tdhopper tdhopper closed this Dec 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants