-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage client randomly dies while downloading a file #2093
Comments
While trying to diagnose this we've added process.on('unhandledRejection', r => console.error(r))
process.on('uncaughtException', r => console.error(r))
console.log(`Process ID: ${process.pid}`)
process.on('SIGHUP', () => console.log('Received: SIGHUP'))
process.on('SIGINT', () => { console.log('Received: SIGINT'); process.exit() })
process.on('SIGQUIT', () => console.log('Received: SIGQUIT'))
process.on('SIGTERM', () => console.log('Received: SIGTERM'))
process.on('SIGUSR1', () => console.log('Received: SIGUSR1'))
process.on('SIGUSR2', () => console.log('Received: SIGUSR2'))
process.on('exit', (a, b, c, d) => console.log('EXIT', a, b, c, d))
const temp = process.exit
process.exit = function () {
console.trace()
process.exit = temp
process.exit()
} but no output to stdin happens. |
Hi @swftvsn thank you for reporting this issue. In your first example is the exit occurring at the first file in loop or after a certain threshold? What is the size of the files? You mention this is happening in a cloud function, what if you run it outside of a cloud function, do you see the same exit behavior? |
File sizes (the files are webp packed images) are between 5kb - 500kb. Happens on CF + development Mac computers. The download consist of 25 files. It exits randomly, usually during first 10 files, but I've seen some runs go through downloading all files successfully. I've tried to download using streams, buffer and destination, no difference. I've also tried to run it with md5, crc32c and false as the integrity checks, but this changes nothing also. Usually we get some kind of error message back (missing rights, time out or similar). We have terabytes of data, so we have a lot of CF also downloading & uploading all the time successfully. In this case all the checks are successful and then download starts, but is cut off suddenly. (As far as I can tell.) Even more disturbing is that the whole nodejs goes down as the result of this. |
I can confirm that |
Thanks @swftvsn I will do some further tests to attempt to recreate this. I know you said none of the handlers you created are hit but are there any other logs / stack traces you might be able to provide? Any chance you could provide a bit more of the logic / code around the calls to download? |
@ddelgrosso1 I have a narrowed-down (randomly) failing test script pointed to a directory of our environment with service account limited to only reading those files. I can provide the example file + SA json but not publicly - what's the best way to provide those? |
I don't think the SA JSON is necessary, I have accounts I can test with. I also don't need the entirety of your production code but if you can provide a relevant snippet that might help in debugging that would be great. If the code is sensitive perhaps you can place the relevant portion in a private repository and add me to it? |
I simply suspect that the issue is related to the storage bucket, but maybe not. I can share the code, as it is custom written for this exact scenario. We can, again, download with the same exact code, from many buckets, but not from that certain one. test-download.ts: (This is the whole file, only filenames array truncated.) import { existsSync, mkdirSync, readFileSync, rmSync } from 'fs'
import { DownloadOptions, Storage } from '@google-cloud/storage'
(async () => {
// Initialise Storage
const serviceAccountBfr = await readFileSync('./google-sa.json')
const serviceAccountObject = JSON.parse(serviceAccountBfr.toString())
const storage = new Storage({
credentials: {
client_id: serviceAccountObject.client_id,
client_email: serviceAccountObject.client_email,
private_key: serviceAccountObject.private_key
},
projectId: serviceAccountObject.project_id
})
// Declare files to download
const files = [
'0WytccuRHqxgIDEYPEpx/kupit/res/3067fa1f-ab42-410c-8b3a-704ede199661.webp',
'0WytccuRHqxgIDEYPEpx/kupit/res/3d01a6bf-6acc-4d52-b40f-7b027ea25154.webp',
// Add 25-50 files here
]
// Setup folder to download files to
const destinationDirectory = './test-files'
if (existsSync(destinationDirectory)) { rmSync(destinationDirectory, { recursive: true }) }
mkdirSync(destinationDirectory)
// Download
const bucket = storage.bucket('bucket')
for (const file of files) {
const options: DownloadOptions = { destination: destinationDirectory + file.substring(file.lastIndexOf('/')) }
await bucket.file(file).download(options)
console.log('Saved ' + file)
}
// We get here randomly..
console.log('Finished')
})() |
(The SA is custom made for this case too and can only read 5 files (That I got from https://developers.google.com/speed/webp/gallery1) in the bucket that is troublesome. To make it fail always, I just copy & paste the 5 file names over and over again to the array so the likelihood of failure goes up.) node is v16.18.0 |
If I download a file the SA has no right to, I get the correct error message. If I kill the network I get an error too (eventually). |
Have you tried adding a (async () => {
// Code here
})().catch(e) {
// Log e here
} |
No, it seems to log nothing. Also, if I do a |
In cloud function logs we only have |
So, @ddelgrosso1 I'm at loss how to debug this, but I do have a failing test case here I can share with you if that is ok? I simply don't want to copy & paste the SA here nor the project id etc. information. If we need paid support for this, just say it ;) |
@swftvsn sure, share the test case (without any sensitive information). If you have a support contract, I would suggest opening a case with them. They might be able to investigate more from the server side / configuration side than I can. So far in my environments I have be unable to recreate this. |
We've opened a ticket in paid side of things, provided this discussion as reference + the failing test case. |
I can copy & paste the code here too, but can I have some other means of communication for the service account? It's a throw away SA and only with super limited rights, but it still feels really wrong to paste it here. |
I think the best action with regards to the SA would be to provide it on the paid side as that will be much more secure than trying to communicate it here or via other means (email, etc). If the code is not sensitive that would be fine to place here. |
So, the whole code is above already, it is missing only the SA and the real bucket name. |
For completeness sake, this is the whole code. test-download.ts: (This is the whole file, only filenames array truncated.) import { existsSync, mkdirSync, readFileSync, rmSync } from 'fs'
import { DownloadOptions, Storage } from '@google-cloud/storage'
(async () => {
// Initialise Storage
const serviceAccountBfr = await readFileSync('./google-sa.json')
const serviceAccountObject = JSON.parse(serviceAccountBfr.toString())
const storage = new Storage({
credentials: {
client_id: serviceAccountObject.client_id,
client_email: serviceAccountObject.client_email,
private_key: serviceAccountObject.private_key
},
projectId: serviceAccountObject.project_id
})
// Declare files to download
const files = [
'0WytccuRHqxgIDEYPEpx/kupit/res/3067fa1f-ab42-410c-8b3a-704ede199661.webp',
'0WytccuRHqxgIDEYPEpx/kupit/res/3d01a6bf-6acc-4d52-b40f-7b027ea25154.webp',
// Add 25-50 files here
]
// Setup folder to download files to
const destinationDirectory = './test-files'
if (existsSync(destinationDirectory)) { rmSync(destinationDirectory, { recursive: true }) }
mkdirSync(destinationDirectory)
// Download
const bucket = storage.bucket('bucket')
for (const file of files) {
const options: DownloadOptions = { destination: destinationDirectory + file.substring(file.lastIndexOf('/')) }
await bucket.file(file).download(options)
console.log('Saved ' + file)
}
// We get here randomly..
console.log('Finished')
})() |
Hi @swftvsn so far myself and another member of the team have had no luck in reproducing this issue. I know you mentioned that you had permissions fairly restrictive on the files. Have you tried with less restrictive permissions and if so do you get the same result? |
@swftvsn since we have not been able to reproduce this issue and have not had any similar issues raised by others, I am going to close this out. If there is additional information that might help us debug further please feel free to reopen. |
@ddelgrosso1 Trying with raw nodejs https client it seems the underlying reason is a ton of ECONNRESET errors. The symptom is that the download has started, so it looks like this is #1822 maybe? |
One thing to investigate is why we're having so much econnresets both inside google and outside google. Still, how can I make sure that the retries work? I've inspected some of the code, and it seems it might not be trivial to swap the underlying client to another. I think the only option for now is to start using our own implementation to download from storage, which is far from optimal. |
@swftvsn if you are getting interrupted downloads (ECONNRESETS) on files that are 5kb to 500kb even using the raw HTTP client, I would look at your network configuration and see if something might forcibly be closing these connections thereby resulting in ECONNRESETS. I do not think this would be related to #1822 as the symptoms we saw in those circumstances was a long period of inactivity before timing out and not an application crash. |
Trying to elaborate a bit more: so, are you saying that the underlying problem identified in #1822 can't, in the event of ECONNRESETS, swallow the error when the download has started? (We're observing exactly that: download starts, some bytes are read, econnreset happens -> exception is swallowed and download dies silently and no retry is done regardless retry settings according to network packet sniffer.) The client should throw the error in all cases. |
We're trying to solve the source reason for the econnresets with the paid cloud support. We're running totally unmodified Cloud Function version 1 on the latest supported NodeJS by that stack, and we don't modify the network settings in any way in our javascript code. We don't use any private networks etc. so this is pure default Google Cloud. I really don't know what network settings we could (or should) modify in this case, but again, the source reason why network errors happen is not relevant to this library, the fact that some errors seem to be swallowed and the retries not working is. |
@swftvsn were you able to figure this out? We're experiencing pretty much the exact same thing. Seeing a bunch of ECONNRESET and then sudden death. It's almost like the event loop dries up after retry. |
@ddelgrosso1 sorry about the ping on this old issue. The lack of catch here looks sus to me so I wanted to call it out just in case: |
Hi @surjikal no worries pinging here but if you are seeing the same issue / similar issue can you possibly open a new bug? If you could include the environment details, i.e. gke, gce that would be helpful. Thanks. |
Environment details
@google-cloud/storage
version: 6.5.4Steps to reproduce
And the nodejs process just dies without any trace. If I change the code above to return a buffer, I've traced it to die in file.js
getBufferFromReadable
:In that case it would print out
But never the C.
I have the files in two separate buckets in
europe-north1
and in multi regioneu
. Both exhibit the same behaviour. We see this on cloud functions running on europe-west1 + two developer machines so far.How can we diagnose this further?
The text was updated successfully, but these errors were encountered: