Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsub: app with default settings eventually runs out of memory (but does not crash) #2810

Closed
jeanbza opened this issue Mar 18, 2018 · 22 comments
Assignees

Comments

@jeanbza
Copy link

jeanbza commented Mar 18, 2018

Environment details

  • OS: alpine 3.6
  • Node.js version: 9.8.0
  • npm version: 5.6.0
  • google-cloud-node version: @google-cloud/[email protected]

Steps to reproduce

Run the following on GKE, pump 30 million messages into pubsub:

const PubSub = require('@google-cloud/pubsub')
const express = require('express')

console.log('Node starting up')

PROJECT_ID = 'some-project'
TOPIC_ID = 'some-topic'
SUBSCRIPTION_ID = 'some-node-subscription'

const pubsub = new PubSub({
  projectId: PROJECT_ID,
})

const topic = pubsub.topic(TOPIC_ID)
topic.createSubscription(SUBSCRIPTION_ID, function (err, subscription) {
  if (err) {
    console.log('Node subscription already created, moving on')
  }
})
const subscription = topic.subscription(SUBSCRIPTION_ID)

subscription.on('error', function (err) {
  console.error('Node error!', err)
})

function onMessage(message) {
  console.log(`Node received ${message.id} ${message.data}`)
  setTimeout(function () {
    console.log('Node acking')
    message.ack()
  }, 5000)
}

console.log('Node receiving..')

subscription.on('message', onMessage)

// because I have no idea how to wait forever..
const app = express()
app.listen(3000, () => console.log('Example app listening on port 3000!'))

// Remove the listener from receiving `message` events.
// subscription.removeListener('message', onMessage)

Once the app runs out of memory (1.5GB), reception slows from ~20k/min to something like 200/min:

screen shot 2018-03-17 at 5 58 42 pm

(removing the timeout, it still runs out of memory)

Let me know if I'm missing something silly here.

@callmehiphop
Copy link
Contributor

@jadekler sorry you're running into this problem. This is actually a duplicate of googleapis/nodejs-pubsub#13 and should be resolved by googleapis/nodejs-pubsub#92 if you'd like to test out that fork.

I'm going to close this, but please feel free to subscribe the issue/PR mentioned about and thanks for reporting!

@jeanbza
Copy link
Author

jeanbza commented Mar 23, 2018

Hi @callmehiphop. This issue seems to be better, but not resolved. Here is the graph with latest numbers; note that it still runs out of memory and the consumption rate (although better) is still a trickle. By trickle, I mean 0-100 or so acks per minute, whereas our C# client libraries are doing around 100k per minute (Go does around 12k per minute, for a more middle-of-the-road number).

screen shot 2018-03-23 at 5 14 27 pm

@callmehiphop
Copy link
Contributor

@jadekler are you applying any flow control limits to this test? If you have a ton of pending messages and aren't capping the number of messages to be in process then you will definitely run out of memory pretty quickly.

const subscription = topic.subscription('my-sub', {
  flowControl: {maxMessages: 50}
});

When we investigated as to why PubSub was using so much memory, we found that writing to the server was sometimes slow and if we did not apply an upper bound to the number of messages we processed we would run out of memory because of all the pending ack/modack requests.

Ideally we would solve this by decreasing our write times, but I'm not sure how we would go about that? @murgatroid99 are we able to configure how server requests are buffered within a gRPC duplex stream?

@jeanbza
Copy link
Author

jeanbza commented Mar 24, 2018

@callmehiphop That makes sense to me. Is there currently no flowControl.maxMessages default? I would imagine the flow to be, we'll set it to x where x is a number that works for 99% of environments, and if you want it faster you can set it higher; as opposed to, we'll have it unbounded and it will OOM any environment given enough pending messages.

@callmehiphop
Copy link
Contributor

@jadekler by default maxMessages is set to Infinity. However I agree that we would be better off setting it lower and letting the user set higher if need be. Although I'm not certain what a reasonable default would be.

@jeanbza
Copy link
Author

jeanbza commented Mar 24, 2018

@callmehiphop Right on. In Go we use 1000 as our max outstanding messages limit. In python I think it's 100? Perhaps something in that range?

@jeanbza
Copy link
Author

jeanbza commented Apr 2, 2018

@callmehiphop In 0.18.0, I get segfaults:

screen shot 2018-04-02 at 2 51 17 pm

Also, could this issue be re-opened please?

@callmehiphop callmehiphop reopened this Apr 2, 2018
@callmehiphop
Copy link
Contributor

@jadekler is this happening with the code provided in the overview?

@jeanbza
Copy link
Author

jeanbza commented Apr 2, 2018

@callmehiphop The code is pasted above. I snagged it from an example of ours somewhere.

@jeanbza jeanbza added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: pubsub Issues related to the Pub/Sub API. and removed status: release blocking labels Apr 4, 2018
@danoscarmike danoscarmike added status: release blocking priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed api: pubsub Issues related to the Pub/Sub API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 4, 2018
@danoscarmike
Copy link
Contributor

@callmehiphop can you please validate Jean's code as a Node expert? If we can repro, this is likely P0. Please let us know. Thanks!

@callmehiphop
Copy link
Contributor

@danoscarmike absolutely. I had been testing on GCE and not seen this issue, it wasn't until later I realized this is happening in GKE. I think we still have gRPC 1.9.x pinned, so I want to see if upgrading to 1.10.x resolves this issue.

@danoscarmike
Copy link
Contributor

Awesome, thank you! Would be great if the upgrade solves it.

@jeanbza
Copy link
Author

jeanbza commented Apr 5, 2018

@callmehiphop Just noticed how much this issue has drifted since the original. Let me know if you'd like me to close this and open a new issue with accurate details + repro steps of the current issue.

@callmehiphop
Copy link
Contributor

@jadekler this should be fine, but thanks for offering!

@callmehiphop
Copy link
Contributor

@jadekler I've just cut a patch release for the google-gax package that should allow PubSub to pick up the latest gRPC version. On the off chance upgrading resolves this issue, would you mind giving this another go before I do a deep dive into the issue?

@jeanbza
Copy link
Author

jeanbza commented Apr 7, 2018

@callmehiphop Sure thing! Node noob - will npm installing "@google-cloud/pubsub": "^0.18.0", be fine or do I need a higher version?

@callmehiphop
Copy link
Contributor

callmehiphop commented Apr 7, 2018 via email

@jeanbza
Copy link
Author

jeanbza commented Apr 7, 2018

@callmehiphop I'm not seeing anything coming up, but I'll let it run over the weekend and check it out monday for weirdnesses.

@callmehiphop
Copy link
Contributor

callmehiphop commented Apr 7, 2018 via email

@jeanbza
Copy link
Author

jeanbza commented Apr 9, 2018

@callmehiphop Happy to report that I saw no problems over the weekend, and that the node pubsub library chugged through 67M messages at a rate of 61,000 acks / minute.

@jeanbza jeanbza closed this as completed Apr 9, 2018
@lukesneeringer lukesneeringer removed the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Apr 9, 2018
@callmehiphop
Copy link
Contributor

@jadekler hooray! Thank you for all your help, I definitely would not have resolved this issue as quickly without it.

@jeanbza
Copy link
Author

jeanbza commented Apr 9, 2018

No prob!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants