-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leak when using Google Sheets API #535
Comments
I've been able to replicate this with a variant of your script, thanks! I think it may be possible to cache the result of createMethod such that we don't create a new closure for every call, which would make the memory leak less dramatic (probably not noticeable), and faster. But before I do that I'll try to identify the actual cycle. |
Running debugging commentary for future reference:
Code:
Code:
Code:
Code:
Code:
And then eg.:
|
…en code causes createMethod to be repeatedly called against the same input params
This is the "leak", a very simple circref. It cannot be avoided without a redesign. It is caused by implementing "dynamic methods" using the descriptor protocol, ala https://github.com/google/google-api-python-client/blob/master/googleapiclient/discovery.py#L1088 If there is a way around this circref, I don't know it. |
Now that we've figured why we perceive there is a leak, a) is it a problem? and b), if so, what can we do about it? Is it a problem?Yes and no. Theoretically, no, it's not a problem. Reference cycles in programs are normal, and Python's garbage collector will eventually trash the objects involved in cycles. Every time a Resource object is created (by calling methods on other Resource objects), we get some cycles between that Resource and its dynamic methods, and, in theory, this is fine. Practically, yes, it's a problem. Repeated Resource creatoin causes the process RSS to bloat, and, on Linux at least, the memory consumed by these references is not given back to the OS due to memory fragmentation, even after the cycles are broken. What can we do about it?I've put it some work on a branch (https://github.com/mcdonc/google-api-python-client/tree/fix-createmethod-memleak-535) trying to make the symptoms slightly better. Try #1 on that branch, which is now a few commits back, and isn't represented by the current state of the branch, was caching the functions that become methods on Resource objects, only creating one function per input instead of one per call. This is not a reasonable fix, however, because refs involved in cycles still grow; every time a Resource is instantiated, it binds itself to some number of methods, and even if the functions representing these methods are not repeatedly created, the act of binding cached methods to each still creates cycles. Try #2, which represents the current state of the branch, dynamically creates and caches one Resource class per set of inputs, instead of just caching the result of dynamic method creation. This disuses the descriptor protocol to bind dynamic methods to instances, so the only circrefs are those as if each resource type had its own class in sys.modules['googleapiclient.discovery']. The number of circrefs is dramatically reduced, and RSS growth is bounded after the first call of the replication script (unlike master, where it grows without bound on each call, although every so often gc kicks in and brings it down a little). According to gc.set_debug(gc.DEBUG_LEAK) under py 3.6, he length of gc.garbage is 2214 after 40 iterations of the reproducer script for-loop, instead of master's gargantuan 45218. And I believe we could bring that down more by fixing an unrelated different leak. So I think we have these options:
|
Thank you very much for the thorough analysis. As you already said, a complete fix would require some time to implement, so the temporary fix using a cache along with Again, thanks! |
@mrc0mmand yes, in your particular case, creating "sheets" only once would make it leak so little that you won't need gc.collect() |
@theacodes can you advise about which one of the options in #535 (comment) is most appropriate? |
@mcdonc @theacodes If I get a vote I'd like to see the second option, adding a .close() method. I've spent the past week or so tracking down this same memory error and found my way to this page. It happens I specifically looked for close() methods in the Resource objects because I knew something somewhere wasn't being released. Adding a .close() method seems cleaner than my having to call gc.collect(). Either way I have to do something to cleanup resources and calling .close() is analogous to what we do already for files and other things. In any case, this issue should be mentioned in the documentation and sample code, please! |
I have a cron job, on google app engine, that reads data in from a google sheet. I am noticing the same memory leak (or maybe a different memory leak?). I tried the recommend work arounds: 1. creating the "sheets" object only once, and use gc.collect(). Neither worked in my case. As a a test, I changed the few lines of code that read data from a google sheet to read data from a database table, and the memory leak went away. |
Can you help me to clarify your last sentence here? Did you ever fix the code, or just confirm the memory leak? I'm in the same situation, app engine job that reads google sheet and started getting "Exceeded soft memory limit" errors. And like you the garbage collection suggestions did not help my situation. |
I never fixed it... in the short term, I used a high mem appengine instance so that would take longer to hit the memory threshold and then, as a long term solution, I switched to airtable instead of google sheets. |
Got it - I'll look into using airtable instead. Great suggestion. Appreciate the help. |
My solution in the end was just too use "pure" http requests |
@AmosDinh Can you elaborate? In my project the memory leak issue has reared its ugly head again and I am looking for new approaches to deal with it. |
@hx2A Well I don't use the the api at all,except for creating credentials. Here is an example from my class.
and
|
@AmosDinh Thank you, I understand now. This is helpful. |
@hx2A glad I could help you |
Alternatively, you could have the sheets api code in another process which you could terminate after execution / RAM usage hitting a certain threshold. - Just wanted to include that option |
@AmosDinh I'm trying to get this working now but I keep getting 403 'Forbidden' responses. I believe it has something to do with my service account's roles and permissions. Can you tell me about how your service account is configured? My current memory leaking code doesn't seem to be using the service account so I need to be sure it is configured correctly. |
Which drive are you accessing? Your personal one, which you can reach by going to https://drive.google.com? In that case you have to add the service account by its email to a folder as editor/owner, then you can edit or create files in that folder using the service account credentials. |
I just got it to work. I had multiple problems but a big part of it was I needed to share the files on my gdrive with the service acount's email address. Thanks! |
I had the same memory leak so I just sprinkled gc.collect() everywhere and bam now its manageable. I doubt that this would count as a fix though |
@marvic2409 you might have a slow memory leak that leaks a small number of MB every hour. For a decent-sized system, it will take some time to become a problem. |
Circling back to this, this recommendation from #535 (comment) is the best way to avoid this.
Creating multiple service objects results in (1) potential memory problems and (2) takes extra time for refreshing credentials. If you're creating a service object inside a loop, or a function that's called more than once, move it outside the loop/function so it can be reused. |
There seems to be a memory leak when using the google-api-client with GSheets.
Environment:
Here's a simple reproducer (without a
.client_secret.json
):For measurements I used
memory_profiler
module with following results:First and second iteration
Last iteration
There's clearly a memory leak, as the reproducer fetches the same data over and over again, yet the memory consumption keeps rising. Full log can be found here.
As a temporary workaround for one of my long-running applications I use an explicit garbage collector call, which mitigates this issue, at least for now:
I went a little deeper, and the main culprit seems to be in the
createMethod
function when creating dynamic methodbatchUpdate
:(This method has a huge docstring.)
Nevertheless, there is probably a reference loop somewhere, as the
gc.collect()
call manages to collect all those unreachable objects.The text was updated successfully, but these errors were encountered: