-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24713]AppMatser of spark streaming kafka OOM if there are hund… #21690
Conversation
…reds of topics consumed
Can one of the admins verify this patch? |
@koeninger Sorry to interrupt, would you please review my patch? Thanks in advance. |
Jenkins, ok to test |
@yuanboliu Can you clarify why repeated pause is necessary? |
Test build #92528 has finished for PR 21690 at commit
|
@koeninger Thanks for your review The first pause is used to stop poll() in the method paranoidPoll |
@yuanboliu From reading KafkaConsumer code, and from testing, I don't see where consumer.position() alone would un-pause topicpartitions. See below. Can you give a counter-example? I am seeing poll() reset the paused state. When you are having the problem, are you seeing the info level log messages "poll(0) returned messages"? If that's what's happening, I think the best we can do is call pause() in only one place, the first line of paranoidPoll, e.g.
Here's what I saw in testing:
|
@koeninger Thanks for your details. Sorry quite busy this week. I will delete the last pause, test the patch on my own cluster this weekend and give feedback asap. |
@yuanboliu What I'm suggesting is more like this: https://github.com/apache/spark/compare/master...koeninger:SPARK-24713?expand=1 |
@koeninger Thanks for your reply. Agree with you. there is no need to to use pause repeatedly. I will update my patch shortly. |
Test build #92709 has finished for PR 21690 at commit
|
@koeninger Sorry to interrupt, could you take a look at my patch? |
What results are you seeing?
…On Thu, Jul 12, 2018, 6:53 AM Yuanbo Liu ***@***.***> wrote:
@koeninger <https://github.com/koeninger> Sorry to interrupt, could you
take a look at my patch?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21690 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAGABxNyVlFhIo7z2D7jXtfeBfsVRsMmks5uF0bXgaJpZM4U-p4M>
.
|
After applying this patch, my application can be running successfully. This issue could happen in the case of many topics(hundreds of ) consumed. |
LGTM, merging to master. Thanks! |
Thanks very much |
We have hundreds of kafka topics need to be consumed in one application. The application master will throw OOM exception after hanging for nearly half of an hour.
OOM happens in the env with a lot of topics, and it's not convenient to set up such kind of env in the unit test. So I didn't change/add test case.