-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should retry on ECONNRESET #2254
Comments
Response to googleapis#2254 Should hopefully catch the issue customers are seeing in Google Cloud Functions where google APIs get into a permanent failed state.
Continuing from #2255 (comment) (cc @s2young) I'm not able to get the process mentioned in this issue to fail. Does anyone have a fool proof way I can get this script or another to break? I've been uploading 24 images at a time and the thumbnails are all generated successfully. |
My app is doing quite a bit more than the example, and I'm stripping some of this out to see if I can avoid the problem. But all is just standard Storage api stuff:
The application is a mobile app, with a content management piece that allows users to upload photos when building a layout. Images are resized for display on Tablets, Large/Small phones, etc. |
One more caveat @stephenplusplus: If you follow the pattern I've given, make sure you ignore the upload of the copies. In my case, when I upload a resized copy I am adding some metadata to the image so that the imageListener function can ignore those storage events. I got myself into an infinite loop early on! |
Haha, thanks for the tip! I was able to reproduce the error (ECONNRESET and ESOCKETTIMEDOUT) but it was only when I was deleting the file before it was done being used. I'm not sure why that triggered those errors, but that's the only thing I can pin-point as the cause. I'll continue to play around. Here's my script-- https://gist.github.com/stephenplusplus/1ac55732adadc1562737cf903307a98d. It takes an upload and creates 10 thumbnails. I'm testing with 25 at a time, and I'm consistently ending up with the correct total of images in the bucket, 250. |
My theory theory is this happens between periods of inactivity with the Cloud Function instance is throttled. I think there's a cached socket that dies--possibly from being too CPU starved to acknowledge any keep-alive fast enough. Try the same script now (hopefully your instance is still alive) or a 20m break after that attempt. Once you see it you should see the error be pretty fatal until a redeploy (which forces a reallocation of your Cloud Function instance) |
Another update, in case it helps. I have another Firebase Function that is listening to the RealTime DB for changes and then deleting the related image from Google Cloud Storage. Basically, in my CMS when a user deletes an image he's deleting the data representation of it in Firebase DB. The listener then deletes the actual image from Storage. I am experiencing the ECONNRESET error in this db listener function. The result is I have orphaned images that are attached to object ids that no longer exist in my db. |
@inlined I ran the script successfully, waited 20 minutes, and it ran successfully again without errors. Would anyone be able to give me a repo I can clone, deploy, and then steps I can follow to reproduce the error? |
Hey all, I forked your gist and add a couple of things to hopefully help - though I didn't test this so be warned!
My recommendation is run the script, wait a few minutes, and then manually delete some of the nodes created in the real-time db and see if the ECONNRESET happens. I hope this helps @stephenplusplus ! https://gist.github.com/s2young/068f082fd30d17f0a1f2146d953106de |
Hey @stephenplusplus I've added you as a collaborator to my 'app-server' project. You should be able to deploy this to your firebase project, and then locally run 'jasmine' command to trigger the /spec/dev/mage.spec.js test. In this test's current state, I'm seeing tons of ECONNRESETs in my Firebase Functions logs. I hope it helps. |
Also, if it helps, I noticed a big jump in ECONNRESETs when I used a singleton instance of a google-cloud/storage object rather than one instance per function. |
And yet another thing: I just updated my firebase-tools, and now the firebase.config().firebase call no longer works, so the code that names the bucket is broken. You'll need to tweak that. Do you know how to programmatically grab the current-context projectId from either firebase or googe-cloud objects? It seems to me that if I'm authed on my local machine my sdk should be able to know the context it's running in. Or do I have to have a config file?? Thanks @stephenplusplus |
I did some digging and found a post on StackOverflow from someone who ran into this (cc @DrPaulBrewer). He opened an issue on the official issue tracker, "cloud storage frequently throws ECONNRESET, read or write, from cloud functions environment." It seems the community has found a couple solutions:
@inlined @DrPaulBrewer @hayanmind can you check your apps using the newest As a side note, you should be able to set var gcs = require('@google-cloud/storage')(...)
gcs.interceptors.push({
request: function(reqOpts) {
reqOpts.forever = false
return reqOpts
}
}) @s2young thank you for that gist -- I set it up and ran it. I've been doing this for the last several hours, waiting anywhere from 3, 5, 10, 25, and 60 minutes between events and monitoring the logs. Still, everything is working consistently for me.
The libraries that I help maintain are the @google-cloud/* ones, so in this case var authClient = require('google-auto-auth')()
authClient.getProjectId(function(err, projectId) {}) That will pick it up from the |
@s2young I'm doing exact same steps (except augmenting metadata) in my app, and it ate my brain for a week thinking why using async/await causes this problem. After changing back my code to promises I kept getting error again and then I've figured it wasn't my code but Cloud Functions'. @stephenplusplus as I stated above, my code is doing exactly same things w. @s2young's and after seeing your reply, I've deleted However I didn't see any version number change in Update After |
Yes, I added the retry solution in a dependency, retry-request, so new and re-installations of Thanks for trying these solutions out. Please keep letting us know the results you get. |
If everyone can confirm:
we can automatically set |
So far, that is true for me. I haven't experienced any further ECONNRESETs since making the forever:false change. Thanks so much! |
We have disabled the forever agent by default in Cloud Function environments. If you un- and re-install @google-cloud/storage, you will pick up the new behavior automatically. Thanks for all of the helpful debugging, everyone! |
@stephenplusplus Thanks for your efforts. Other tasks here took priority. I updated my SO posting to reflect the above changes but have not had time to test. |
I am seeing ECONNRESET in my Google Cloud Functions now. I am using Unfortunately, even explicitly setting the |
Could this be due to the fact that network and CPU are throttled down to 0 between function invocations, so the connection is lost on the new invocation? |
I get this error 100% of the time when trying to upload files in parallel. Adding
did nothing to resolve the issue. Our use case is using the client library to upload files in parallel, a fairly straightforward use case. |
edit by @stephenplusplus
If you're just joining the conversation...
Here is where we stand: We have disabled the forever agent by default in Cloud Function environments. If you un- and re-install @google-cloud/storage, you will pick up the new behavior automatically.
Original Post
Environment details
Steps to reproduce
Theory
I've noticed this in my own Cloud Functions and only in those that use google-cloud-node. I think the socket is dying between requests when the Cloud Functions environment is throttled. This should be OK, but I suspect the google-cloud-node module needs to add an extra error case here. I'll write a pull request to that effect.
The text was updated successfully, but these errors were encountered: