-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage API has poor performance in Google Cloud Functions #101
Comments
All other API requests made through the Storage API should be using the Keep-Alive header. The default configuration for all requests is here: https://github.com/googleapis/nodejs-common/blob/22383ec89b2866b52850594efab60bd5484a81cc/src/util.js#L30-L37 It's possible for users to change all of our settings with request interceptors. That would look like: const Storage = require('@google-cloud/storarage')
const gcs = new Storage({ ... })
gcs.interceptors.push({
request: reqOpts => {
reqOpts.forever = true
return reqOpts
}
}) What do you think-- should we reverse the |
Thanks for the tip! |
My suggestion is that the API should/ought to manage this based off of the action being performed if it is a requirement. For example if I have a download operation to fetch data, do stuff with it and then upload it I've got to keep this straight in my code on what the library dependancy is for downloading (turn off keep alive) and then turn it back on for uploading. I currently have Google Functions that are timing out due to this issue where the uploading to Storage has to have the interceptor updated now...previous version I didn't (1.1.1 for example "just worked") but with 1.5.x with the reqOpts.forever is causing the uploads to timeout on about 30% of the function invocations. |
Thanks for the input.
The only time we disable If we did have some methods using keepAlive, and multiple others not, you could get specific with the interceptor: gcs.interceptors.push({
request: reqOpts => {
if (reqOpts.method === 'POST' && reqOpts.uri.includes('{somethingFromTheApiUrl}')) {
reqOpts.forever = true
}
return reqOpts
}
}) And to go even further, you can assign interceptors on all levels of the hierarchy: gcs.interceptors.push({ request: function(reqOpts) {} })
bucket.interceptors.push({ request: function(reqOpts) {} })
file.interceptors.push({ request: function(reqOpts) {} }) |
Thanks @stephenplusplus - I'm having a terrible issue with the upload process from my cloud functions to cloud storage is timing out nearly 90% of the time now. This started around December 6th. I thought perhaps it was something with my setting the keepalive to false to fix the download issue but with updated code to create a new storage object to use I'm still getting timeouts more than successes. And I've not a clue on who/what or where I need to try to get help other than chucking out $300 bucks for support with Google. |
In my case the GCP Console was very helpful for debugging this issue. |
Thanks @dmho418 for the hint. However my issue is that the Google Function that I'm trying to upload content to GCS just timesout. I built a very simple test function:
If I point this to a file that's less than 5MB then the function executes and finished successfully. If I point the script to an image that's larger than 5MB then the function fails. This behavior started to be manifested on December 7th however the change to the Google environment likely took place earlier. There's an open ticket with BugTracker: https://issuetracker.google.com/issues/70555688 opened by another with the same timeout issue however they are saying they have been able to download and upload payloads larger than 10MB. The |
I'm sorry, I was mistaken. I forgot we had a request in April (2017) to not send the Keep-Alive header for Cloud Functions in all of our APIs: (googleapis/google-cloud-node#2254). This seems to be a unique issue for GCF, so we'll have to see where we get in the Google issue tracker: https://issuetracker.google.com/issues/70555688 For now, I'll call this blocked on our part (the client library). Feel free to provide any extra information, especially if anyone else is having the same problem, or has an idea where we might look for an answer. |
Having the same issue with Firebase Functions: |
Hey @stephenplusplus, |
I can confirm
Increased the speed of my uploads from ~20-30 seconds to 2-3 seconds. |
@stephenplusplus Would it make sense to make the options @danielrasmuson suggests the default? |
@fhinkel I'm not sure. The suggested fix for this issue would reverse the fix for googleapis/google-cloud-node#2254. |
Thx for your help @stephenplusplus on this issue. I'm having the following use case where I read a stream from a remote file stored on Cloud Storage, processing it through multiple transform and finally writing the processed files into two different buckets.
Locally, I get the two files correctly written on both destination buckets but i'm getting timeout while I run this code on my Cloud Function environment no matter which option I choose regarding the
It looks like pseudoStream transform slows down the whole process as it needs to access other remote resources for each chunk I process through the stream thus causing the timeout issue. The only workaround I came with so far is to process smaller files (~5Mb instead of ~10Mb) to avoid the timeout. |
Hey @victor5114, would you mind opening a new issue for that? I don't believe this is related (or at least, there could be an unrelated way to solve it). |
Our library now is using a lot of new guts, especially in terms of the transport libraries. Is anyone still seeing a performance drop in GCF using a newer version of Storage? |
Closing this stale issue upon recommendation to try newer versions of the library with new transport implementations. @dmho418 Please feel free to reopen this issue if you still experience the issue. |
edit by @stephenplusplus
Follow along in the Google issue tracker: https://issuetracker.google.com/issues/70555688
Environment details
Steps to reproduce
The API seems to never reuse connections, which causes Cloud Functions using this API to have poor performance and blow up socket connection and DNS quotas very easily.
In the best practices guide (https://cloud.google.com/functions/docs/bestpractices/networking) they give the NodeJS PubSub as an example, which when declared globally will avoid uncesesary DNS queries and connections.
Could be because the configuration of the requests are hardcoded
nodejs-storage/src/file.js
Line 510 in 07130a5
The text was updated successfully, but these errors were encountered: