Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HUGE memory leak problem #768

Closed
jakeleventhal opened this issue Sep 29, 2019 · 74 comments
Closed

HUGE memory leak problem #768

jakeleventhal opened this issue Sep 29, 2019 · 74 comments
Assignees
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@jakeleventhal
Copy link

jakeleventhal commented Sep 29, 2019

Environment details

  • OS: Mac OS 10.15 Beta (19A573a)
  • Node.js version: v12.10.0
  • npm version: 6.10.3
  • @google-cloud/firestore version: 2.3.0

I'm still experiencing a memory leak potentially related to #661. The following is from my yarn.lock:

"@grpc/grpc-js@^0.5.2":
  version "0.5.2"
  resolved "https://registry.yarnpkg.com/@grpc/grpc-js/-/grpc-js-0.5.2.tgz#8e39876033749e1611fe9fa181d9535077805abe"
  integrity sha512-NE1tP/1AF6BqhLdILElnF7aOBfoky+4ZOdZU/0NmKo2d+9F9QD8zGoElpBk/5BfyQZ3u1Zs+wFbDOFpVUzDx1w==
  dependencies:
    semver "^6.0.0"

"@grpc/proto-loader@^0.5.0", "@grpc/proto-loader@^0.5.1":
  version "0.5.1"
  resolved "https://registry.yarnpkg.com/@grpc/proto-loader/-/proto-loader-0.5.1.tgz#48492b53cdda353110b51a4b02f465974729c76f"
  integrity sha512-3y0FhacYAwWvyXshH18eDkUI40wT/uGio7MAegzY8lO5+wVsc19+1A7T0pPptae4kl7bdITL+0cHpnAPmryBjQ==
  dependencies:
    lodash.camelcase "^4.3.0"
    protobufjs "^6.8.6"

Quick note: because of the grpc problems that keep appearing here, we should probably put grpc version as part of the issue template

So it is clear that I'm actually using the "correct" grpc. My application makes many calls to firestore and I'm experiencing massive memory leaks.
Screen Shot 2019-09-28 at 6 53 37 PM

I'm not actually seeing the error message described previously (MaxListenersExceededWarning), however, the memory usage on my application slowly and steady is increasing. I've taken several heap snapshots from local tests over the course of several hours and when comparing snapshots from a few hours apart, I notice that the top culprits for memory allocations are all related to grpc.

Here you can clearly see that the top 4 culprits for the memory delta between snapshots 12 and 4 are vastly more than everything else
Screen Shot 2019-09-28 at 7 08 24 PM

Here are the contents of each from largest to smallest:

  1. (string): This by far is the fastest growing constructor. As you can see, there are 920k new allocations in the comparison between the two snapshots. Practically every single string here is ""(function anonymous( ) { return function BeginTransactionRequest(p){ if(p)for(var ks=Object.keys(p),i=0;i<ks.length;++i)if(p[ks[i]]!=null) this[ks[i]]=p[ks[i]] } })". And this all comes from message.js:13. This is a a file from the package protobufjs which is a dependency of grpc.

Screen Shot 2019-09-28 at 7 47 17 PM

  1. (system): Essentially everything here seems to be coming from from the same message.js:13.

Screen Shot 2019-09-28 at 7 29 51 PM

I also have tens of thousands of objects that look like this (from grpc)

Screen Shot 2019-09-28 at 8 10 43 PM

  1. (array): Practically every single array allocation here is a alloc size 32 and is linked to type.js:31 and namespace.js:95, both of which are also part of protobufjs.

Screen Shot 2019-09-28 at 8 18 42 PM

"Field" is also related to the same line numbers and is actually directly linked to grpc

Screen Shot 2019-09-28 at 8 20 38 PM

  1. Array: There are 235k of these and they all look the same - Array with an alloc size of 32 with some connection to grpc

Screen Shot 2019-09-28 at 7 12 35 PM

@jakeleventhal
Copy link
Author

Update: I tried updating to 0.5.4 based on the comments in this thread and it did not help

Here is the new yarn.lock:

"@grpc/grpc-js@^0.5.2":
  version "0.5.4"
  resolved "https://registry.yarnpkg.com/@grpc/grpc-js/-/grpc-js-0.5.4.tgz#20ee7880e69805cbc63a1c54a3af02cc6475c850"
  integrity sha512-aY4fTCz7jq7oKFmfAeZVqGzMCR5I9NLdY9E2fJ70QtGXwlJnTaN6cnbRmCk23/aKPx9UHqOtk2lyjpN6LbAaxw==
  dependencies:
    semver "^6.2.0"

"@grpc/proto-loader@^0.5.0", "@grpc/proto-loader@^0.5.1":
  version "0.5.2"
  resolved "https://registry.yarnpkg.com/@grpc/proto-loader/-/proto-loader-0.5.2.tgz#c84f83be962f518bc303ca2d5e6ef2239439786c"
  integrity sha512-eBKD/FPxQoY1x6QONW2nBd54QUEyzcFP9FenujmoeDPy1rutVSHki1s/wR68F6O1QfCNDx+ayBH1O2CVNMzyyw==
  dependencies:
    lodash.camelcase "^4.3.0"
    protobufjs "^6.8.6"

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Sep 29, 2019
@jakeleventhal
Copy link
Author

If it is of any help, the way I use firestore is as follows:

I have a file called firestore.js that looks like this:

import Firestore from '@google-cloud/firestore';

export default new Firestore();

I then import firestore and use it in other places like this:

import firestore from './modules/firestore';

const updateThing = async (req, res) => {
  try {
    const { docId, updatePayload } = req.body;

    await firestore.doc(`Things/${docId}`).update(updatePayload);

    return res.status(200).send();
  } catch (err) {
    console.error(err);
    return res.status(500).send();
  }
};

I can probably have a temporary workaround by just creating a new Firestore instance from within each function body, but fixing the underlying problem is a more sustainable approach.

Also, my application doesn't have any event listeners. The only operations it uses are get(), set(), and update()

@schmidt-sebastian
Copy link
Contributor

@jakeleventhal Thanks for the very detailed issue! Can you describe your work load a bit? It would help to know how many requests you are sending and if the memory usage tapers off once those requests are processed.

@schmidt-sebastian schmidt-sebastian added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. labels Sep 30, 2019
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Sep 30, 2019
@bcoe bcoe added priority: p2 Moderately-important priority. Fix may not be included in next release. needs more info This issue needs more information from the customer to proceed. and removed triage me I really want to be triaged. labels Sep 30, 2019
@murgatroid99
Copy link

I have a couple of observations here:

The string that is being constructed almost a million times looks like a generated variant of the protobuf.js Message constructor that is specific to various message types from the .proto files in the Firestore library. The most likely reason for that happening is that the .proto files are getting loaded repeatedly. Other memory usage from protobuf.js probably has the same cause. The library @grpc/proto-loader is essentially just a proxy for protobuf.js, so memory leaks there probably also have the same cause.

The Http2Subchannel objects from grpc are probably the result of repeatedly constructing new clients, and that version of the grpc library tends to keep channels and subchannels alive even when they are not being used. This is improved in @grpc/grpc-js version 0.6.x, mainly because subchannels with identical parameters will be reused.

@jakeleventhal
Copy link
Author

@schmidt-sebastian So I have an API server that accepts HTTP requests from a another scheduling server. The scheduling server repeatedly sends jobs to the API to save user data.

There are many operations that occur but, for the most part, they can all be summarized as this:

  1. Fetch a large data report from a 3rd party API
  2. Parse that data to do one of or both of the following:
  • a) Create a document with set()
  • b) Modify an existing document with update()

There can be anywhere from 1-1000 or so firestore calls per minute per user. Each document that is updated is unique so there is no contention going on here.

I also have used log statements to see that functions are entering and exiting (and they all more or less follow a similar structure to what I wrote above). The scope of the firestore calls should be limited to just the function. Moreover, the firestore calls aren't throwing errors or anything and are updating documents successfully. Again, there are no event listeners anywhere in my code.

I have also tried not using a single instance of Firestore that I import throughout my code - I tried creating a new instance of Firestore within the body of each function that uses firestore. I tracked the number of instances of Firestore objects through memory snapshots and was able to see that the Firestore objects were successfully being deleted/deallocated at the end of each function; however, the metadata with all those strings/arrays/objects still persisted outside the scope of the functions.

I hope that clears up everything.

@swftvsn
Copy link

swftvsn commented Oct 1, 2019

@jakeleventhal we've hit with this kinds of problems too in cloud functions: firebase/firebase-functions#536

The problems originate from grpc-js, which is still experimental.

What worked for us in the end was to downgrade our dependencies so it still uses the binary grpc which is much more mature and stable.

There's also a mechanism to force firestore to use binary grpc with the newer client version described here: firebase/firebase-functions#536 (comment), however I have not tested that one as it is unsupported / untested if I understood correctly. That note also explains a bit why we are in this mess.

So, after grpc-js matures a bit, we’ll hop back to the latest versions.

@jakeleventhal
Copy link
Author

jakeleventhal commented Oct 1, 2019

@swftvsn what functionality will be lost/what differences will there be? also, what is the syntax with ‘@google-cloud/firestore’

@jakeleventhal
Copy link
Author

jakeleventhal commented Oct 1, 2019

@swftvsn @schmidt-sebastian why is firestore not using the binary method of grpc until the other version is stable? what’s to be gained? this memory leak cost me close to $1k on gcp and is now a bottleneck for launching my product. i’m sure i’m not the only one experiencing issues like this

@swftvsn
Copy link

swftvsn commented Oct 1, 2019

@jakeleventhal The reasoning to use grpc-js is explained in this comment: firebase/firebase-functions#536 (comment)

We didn't lose any functionality we need when switching to earlier version, but of course we just checked everything we need. (We use typescript, so basically everything compiles -> you're good to go. ymmw etc. ofcourse.)

@schmidt-sebastian
Copy link
Contributor

Providing a different GRPC implementation via the constructor argument (as described in firebase/firebase-functions#536 (comment)) is fully supported. All currently available features in Firestore are compatible with both grpc and @grpc/grpc-js.

@murgatroid99 Do you have a rough idea of where in our stack this problem might best be addressed?

@murgatroid99
Copy link

I'm only really familiar with my end of the stack, so I can only say that it's somewhere above that. Probably this firestore library. Loading the same .proto file repeatedly is not an intended usage of @grpc/proto-loader, so I recommend that you find a way to load it only once per process.

Similarly, constructing many clients for the same service and backend is not the intended usage of either grpc library, but that should be a much smaller problem with the new release of grpc-js that gax should be picking up soon.

@schmidt-sebastian
Copy link
Contributor

I spent some more time on this, but I am still not able to pinpoint the leak. What is interesting though is that this only seems to happen with set()/update()/delete() and not with a get() call.

After 2000 operations, my memory Profile looks like this:

Screen Shot 2019-10-01 at 2 50 05 PM

The memory usage does not go down even when all operations resolve.

@jakeleventhal
Copy link
Author

Is this confirmed to be fixed using the regular grpc library? i.e.

import grpc from 'grpc'

const firestore = new Firestore({ grpc })

@merlinnot
Copy link

@schmidt-sebastian I don't know if it's related, but there is a similar issue in BigQuery: googleapis/nodejs-bigquery#547

@schmidt-sebastian
Copy link
Contributor

@jakeleventhal I see similar memory profiles with both grpc and @grpc/grpc-js.

@schmidt-sebastian
Copy link
Contributor

It looks like the memory leak happens in one of our depend layers, so it might take a bit of work to track this down. The memory leak also exist when the Veneer client (which is an auto-generated SDK that we use for our request logic) is used directly. What is somewhat interesting is that we only seem to be leaking memory once we get responses from the backend - I see a steady increase of memory consumption only after requests are starting to get fulfilled.

With 2000 writes, FirestoreClient_innerApiCalls has a retained size of 99680 bytes that is mostly unaccounted for. There are also 2000 PromiseWrap and Promise objects on the stack that take up roughly this amount of memory.

@swftvsn
Copy link

swftvsn commented Oct 2, 2019

Great to see the investigation resulting some fruit @schmidt-sebastian - and sorry that I jumped to conclusion and thus wrote bs above..

@jakeleventhal
Copy link
Author

jakeleventhal commented Oct 3, 2019

@schmidt-sebastian There is definitely a huge improvement using the legacy grpc (new Firestore({ grpc })). It's orders of magnitude different.
These are tests run (with the exact same traffic/load)

grpc-js
Screen Shot 2019-10-02 at 8 57 08 PM

grpc
Screen Shot 2019-10-02 at 8 56 40 PM

You'll notice the memory is still climbing in the second example - but its very very minimal compared to the grpc-js version

@urirahimi
Copy link

urirahimi commented Dec 4, 2019

@schmidt-sebastian Any update on "next week" or did you have one of these weeks in mind https://www.mentalfloss.com/article/51370/why-our-calendars-skipped-11-days-1752

@urirahimi
Copy link

But in all seriousness, I'm spending a doggone fortune because of this leak..

@schmidt-sebastian
Copy link
Contributor

I am very sorry, we are still working on this. We are making progress every day and I think I can see the finish line.

@urirahimi
Copy link

Is there any update on this? Bleeding cash for me.

@jakeleventhal
Copy link
Author

Yeah, this has been nearly 3 months... come on...

@schmidt-sebastian
Copy link
Contributor

We are basically done: #779 is waiting for a final review.
We also have some timeout settings that we need to fix in #805, but then we will release new client.

@schmidt-sebastian
Copy link
Contributor

schmidt-sebastian commented Dec 15, 2019

Version 3.0.0 should be released on Monday morning (Pacific time). With this version, we close the GRPC client when a channel becomes idle. If you request load frequently causes new channels to be spawn and released, this should reduce the memory burden described in this issue.

I am sorry and somewhat embarrassed by how long this took. I hope this solves the root issue described in here.

@bfitec
Copy link

bfitec commented Dec 17, 2019

Any idea how soon firebase-admin-node will be upgraded from 2.6v to start using 3.0v of nodejs-firestore? @schmidt-sebastian I'm really glad a solution is in the pipe here, this has caused us a lot of grief, thank you for your efforts!

@jakeleventhal
Copy link
Author

Experiencing over 15x reduction in memory footprint with 3.0.0 👍🏼

@schmidt-sebastian
Copy link
Contributor

@bfitec Unfortunately, we may have another breaking change coming up shortly as Node 8 goes end of life end of the year. We still have to figure out what this means for us and if we have to bump the major version for Firestore again, but we don't want to bump the major version for Firebase Admin twice in short succession and hence we haven't updated Firebase Admin.
Right now, we advise you to manually override the Firestore version for Firebase Admin (via npm shrinkwrap for example). Let us know if this is a huge hassle and we can reconsider.

@jmacedoit
Copy link

@schmidt-sebastian I changed package.lock to edit the version of @google-cloud/firestore to 3.0.0 on a project using firebase admin. However I continue to observe huge amounts of memory (+3Gb)

In my npm ls output only v3.0.0 is shown as used. I also get this message npm ERR! invalid: @google-cloud/[email protected] /Users/joaomacedo/Projects/firestore-cgd-gateway/node_modules/firebase-admin/node_modules/@google-cloud/firestore but I guess that is to be inspected since I altered the dependency forcefully to 3.0.0.

@jmacedoit
Copy link

In our service we are listening for some dozens of thousands of streams. We noticed the more streams we need to be listening to the more memory our process needs. It is currently requiring +4Gb and increasing as new clients join. This should not happen as 98% of the streams have no messages being exchanged other that the first couple. As I said above, we attempted forcing the 3.0.0 firestore cloud version by editing the package.lock.

@jmacedoit
Copy link

jmacedoit commented Dec 18, 2019

@schmidt-sebastian You advised using shrinkwrap. Is using npm shrinkwrap for this case functionally different from changing package.lock?

@schmidt-sebastian
Copy link
Contributor

schmidt-sebastian commented Dec 18, 2019

@jmacedoit I added a branch to firebase-admin-node that should eliviate some of the installation pain. You can install as follows:

 "dependencies": {
    "firebase-admin": "git+https://github.com/firebase/firebase-admin-node.git#firestore-30"
  }

The bad news is that this might not address your specific memory concerns. Our networking infrastructure only supports 100 concurrent operations over a single connection and as such, we need to create new GRPC connection for every 100 operations. The changes in this PR merely address the memory usage of GRPC clients that are no longer needed, but if you have a steady count of thousands of listeners, then some high memory usage is expected (at least for now).

@jmacedoit
Copy link

jmacedoit commented Dec 18, 2019

@schmidt-sebastian Thanks for the branch!

Any tips on how can this be addressed? The project in question is a chatbot gateway. It needs to listen to the streams of thousands of users (1 stream per user), in order to send the message to a core so that it is processed and the answer is placed back in the client stream. Naturally, most of the time the streams will have no activity. (Including people that send 1 message and never interact with the service again) Is there anything that can be done to mitigate the memory needed to maintain these connections?

You say

GRPC only supports 100 concurrent operations at a time and as such, we need to create new GRPC clients for every 100 operations. The changes in this PR merely address the memory usage of GRPC clients that are no longer needed,

The memory increases as soon as I start listening to all the streams even without any incoming messages. Is this expected?

@schmidt-sebastian
Copy link
Contributor

Two options come to mind:

  • If latency allows, use Cloud Functions to trigger an operation whenever a new message is written.
  • Use the same collection ID as a parent to all messages and use a single listener via Collection Groups.

If you have engineering resources to spare, we could also add an option to issue multiple listeners over the same stream. This is supported by the backend and currently used by the Android, iOS and Web SDK. We deliberately decided to use one stream per listeners as this provides isolation for all operations in the Server SDKs, but we could optionally bundle listeners together in the Node Server SDK. Unfortunately, this is a significant effort and will likely not be staffed by us in the near term.

As for your last question - yes, a new operation is spun once onSnapshot is called.

@jmacedoit
Copy link

jmacedoit commented Dec 18, 2019

@schmidt-sebastian Collection groups would be great, unfortunately the client applications are already deployed (and the collection names to which they write have names which differ for each user) and we need a short term solution to the problem. I guess we will have to go with the cloud functions route, although to be honest I prefer to avoiding cloud functions whenever possible...

Edit: Actually, on a closer look I may be able to use collection groups! Thanks

@jakeleventhal
Copy link
Author

Looks like this issue generated yet another issue 🤦🏻‍♂️
#829

@google-cloud-label-sync google-cloud-label-sync bot added the api: firestore Issues related to the googleapis/nodejs-firestore API. label Jan 31, 2020
@siddhant-mohan
Copy link

siddhant-mohan commented May 7, 2020

The problem is still not solved. I am using firebase functions 3.6.1.
I am getting error as:
memory limit exceeded. Function invocation was interrupted

My package.json file is:
{ "name": "functions", "description": "Cloud Functions for Firebase", "scripts": { "lint": "eslint .", "serve": "firebase serve --only v1", "shell": "firebase functions:shell", "start": "npm run shell", "deploy": "firebase deploy --only v1", "logs": "firebase functions:log" }, "dependencies": { "body-parser": "^1.19.0", "camelcase": "^5.3.1", "es6-promise-pool": "^2.5.0", "express": "^4.17.1", "firebase-admin": "~8.0.0", "firebase-functions": "^3.6.1", "firebase-functions-helper": "^0.7.5", "glob": "^7.1.6", "react-native-uuid": "^1.4.9" }, "engines": { "node": "10" }, "devDependencies": { "eslint": "^5.12.0", "eslint-plugin-promise": "^4.0.1" }, "private": true }

My function file is:

'use strict';

const functions = require('firebase-functions');
const admin = require('firebase-admin');
const firebaseHelper = require("firebase-functions-helper");
const db = admin.firestore();

const promisePool = require('es6-promise-pool');
const PromisePool = promisePool.PromisePool;
// Maximum concurrent account deletions.
const MAX_CONCURRENT = 2;
var FieldValue = require('firebase-admin').firestore.FieldValue;


/**
 * Run once a day at midnight, to cleanup the users
 * Manually run the task here https://console.cloud.google.com/cloudscheduler
 */
exports.autoAcceptFriendship = functions.runWith({memory: '2GB', timeoutSeconds: '360'}).pubsub.schedule('every 30 minutes').onRun(async context => {
  // Fetch all user details.
  const pendingFriendships = await pendingFriendship();
  // Use a pool so that we delete maximum `MAX_CONCURRENT` users in parallel.
  const promisePool = new PromisePool(() => approveFriendship(pendingFriendships), MAX_CONCURRENT);
  await promisePool.start();
  console.log('All bot friendship approved');
});

/**
 * Deletes one inactive user from the list.
 */
async function approveFriendship(pendingFriendships) {
  if (pendingFriendships.length > 0) {
    const friendshipToApprove = pendingFriendships.pop();

    console.log('friendship accepted started for ', JSON.stringify(friendshipToApprove));

    //accept Friendship
  try {
    const friendship = {
      user1: friendshipToApprove.user1,
      user2: friendshipToApprove.user2,
      jointIds: [friendshipToApprove.user1, friendshipToApprove.user2].sort(),
      created_at: FieldValue.serverTimestamp(),
    };
    const newDoc = await firebaseHelper.firestore
      .createNewDocument(db, 'friendships', friendship);

    const deletedContact = await firebaseHelper.firestore
      .deleteDocument(db, 'pending_friendships', friendshipToApprove.id);
    console.log('friendship accepted');
  }
  catch (error) {
    console.log('error occured', error);
    }
  } else {
    return null;
  }
}

/**
 * Returns the list of all inactive users.
 */
async function pendingFriendship() {

  let date = new Date();

  date.setDate(date.getDate() - 1);

  console.log('date is ', date)

  let userIds = ['vUtLDsL7JvUMP4mxJi1eqiB2HXc2'];


  let p_friendship_array = [];

  let pend_friendship_data = await firebaseHelper.firestore.queryData(db, 'pending_friendships', [['created_at', '>=', date]]);


  let data = [];

  Object.keys(pend_friendship_data).forEach((key, Index) => {
    let val = pend_friendship_data[key];
    val = {...val, id: key}
    data.push(val)
  })

  p_friendship_array = data.filter(u => {
    return userIds.indexOf(u.user2) !== -1;
  });

  console.log('no of pend friendship request is ', p_friendship_array.length)

  return p_friendship_array;
}
```

@schmidt-sebastian
Copy link
Contributor

@siddhant-mohan If you turn on logging (setLogFunction) we can check how many GRPC clients you are using. Given your usage of PromisePool, this should just be one client (which can handle 100 concurrent requests). The debug logs would help us verify.

@siddhant-mohan
Copy link

@schmidt-sebastian hey can you help me how to turn on logging to check GRPC clients my function is using? Sorry I am quite new to the firebase functions

@schmidt-sebastian
Copy link
Contributor

It should be:

admin.firestore.setLogFunction(console.log);

@siddhant-mohan
Copy link

This is what i am getting as a setLogFunction output

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests