Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage client randomly dies while downloading a file #2093

Closed
swftvsn opened this issue Oct 24, 2022 · 31 comments
Closed

Storage client randomly dies while downloading a file #2093

swftvsn opened this issue Oct 24, 2022 · 31 comments
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@swftvsn
Copy link

swftvsn commented Oct 24, 2022

Environment details

  • OS: Mac, Cloud Functions
  • Node.js version: 16.18, 19.0
  • npm version: The one installed with respective node version.
  • @google-cloud/storage version: 6.5.4

Steps to reproduce

  1. Run downloads
const credentialsString = // Read SA from disk
const credentials = JSON.parse(credentialsString)
const storage = new Storage({ credentials: credentials })
const bucket = storage.bucket()

const destinationDirectory = './test-files'
if (existsSync(destinationDirectory)) { rmSync(destinationDirectory, { recursive: true }) }
mkdirSync(destinationDirectory)

for (const file of files) {
  const options: DownloadOptions = { destination: destinationDirectory + file.substring(file.lastIndexOf('/')) }
  await bucket.file(file).download(options)
  console.log('Saved ' + file)
}

console.log('Finished')

And the nodejs process just dies without any trace. If I change the code above to return a buffer, I've traced it to die in file.js getBufferFromReadable:
image

In that case it would print out

A
B 1
B 2

But never the C.

I have the files in two separate buckets in europe-north1 and in multi region eu. Both exhibit the same behaviour. We see this on cloud functions running on europe-west1 + two developer machines so far.

How can we diagnose this further?

@swftvsn swftvsn added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Oct 24, 2022
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Oct 24, 2022
@swftvsn
Copy link
Author

swftvsn commented Oct 24, 2022

While trying to diagnose this we've added

process.on('unhandledRejection', r => console.error(r))
process.on('uncaughtException', r => console.error(r))
console.log(`Process ID: ${process.pid}`)
process.on('SIGHUP', () => console.log('Received: SIGHUP'))
process.on('SIGINT', () => { console.log('Received: SIGINT'); process.exit() })
process.on('SIGQUIT', () => console.log('Received: SIGQUIT'))
process.on('SIGTERM', () => console.log('Received: SIGTERM'))
process.on('SIGUSR1', () => console.log('Received: SIGUSR1'))
process.on('SIGUSR2', () => console.log('Received: SIGUSR2'))
process.on('exit', (a, b, c, d) => console.log('EXIT', a, b, c, d))
const temp = process.exit
process.exit = function () {
  console.trace()
  process.exit = temp
  process.exit()
}

but no output to stdin happens.

@ddelgrosso1
Copy link
Contributor

Hi @swftvsn thank you for reporting this issue. In your first example is the exit occurring at the first file in loop or after a certain threshold? What is the size of the files? You mention this is happening in a cloud function, what if you run it outside of a cloud function, do you see the same exit behavior?

@ddelgrosso1 ddelgrosso1 added the needs more info This issue needs more information from the customer to proceed. label Oct 24, 2022
@swftvsn
Copy link
Author

swftvsn commented Oct 24, 2022

File sizes (the files are webp packed images) are between 5kb - 500kb. Happens on CF + development Mac computers. The download consist of 25 files. It exits randomly, usually during first 10 files, but I've seen some runs go through downloading all files successfully. I've tried to download using streams, buffer and destination, no difference. I've also tried to run it with md5, crc32c and false as the integrity checks, but this changes nothing also.

Usually we get some kind of error message back (missing rights, time out or similar). We have terabytes of data, so we have a lot of CF also downloading & uploading all the time successfully.

In this case all the checks are successful and then download starts, but is cut off suddenly. (As far as I can tell.) Even more disturbing is that the whole nodejs goes down as the result of this.

@ddelgrosso1
Copy link
Contributor

@swftvsn attempted to recreate this issue in my test environment but was unsuccessful. However, I did recreate #2082 and wonder if this might somehow be related. Would it be possible for you to test with version 6.6.0 and see if you have the same issue?

@swftvsn
Copy link
Author

swftvsn commented Oct 26, 2022

I can confirm that 6.6.0 does not fix my issue.

@ddelgrosso1
Copy link
Contributor

Thanks @swftvsn I will do some further tests to attempt to recreate this. I know you said none of the handlers you created are hit but are there any other logs / stack traces you might be able to provide? Any chance you could provide a bit more of the logic / code around the calls to download?

@swftvsn
Copy link
Author

swftvsn commented Oct 26, 2022

@ddelgrosso1 I have a narrowed-down (randomly) failing test script pointed to a directory of our environment with service account limited to only reading those files.

I can provide the example file + SA json but not publicly - what's the best way to provide those?

@ddelgrosso1
Copy link
Contributor

I don't think the SA JSON is necessary, I have accounts I can test with. I also don't need the entirety of your production code but if you can provide a relevant snippet that might help in debugging that would be great. If the code is sensitive perhaps you can place the relevant portion in a private repository and add me to it?

@swftvsn
Copy link
Author

swftvsn commented Oct 26, 2022

I simply suspect that the issue is related to the storage bucket, but maybe not.

I can share the code, as it is custom written for this exact scenario. We can, again, download with the same exact code, from many buckets, but not from that certain one.

test-download.ts: (This is the whole file, only filenames array truncated.)

import { existsSync, mkdirSync, readFileSync, rmSync } from 'fs'
import { DownloadOptions, Storage } from '@google-cloud/storage'

(async () => {

  // Initialise Storage
  const serviceAccountBfr = await readFileSync('./google-sa.json')
  const serviceAccountObject = JSON.parse(serviceAccountBfr.toString())
  const storage = new Storage({
    credentials: {
      client_id: serviceAccountObject.client_id,
      client_email: serviceAccountObject.client_email,
      private_key: serviceAccountObject.private_key
    },
    projectId: serviceAccountObject.project_id
  })

  // Declare files to download
  const files = [
    '0WytccuRHqxgIDEYPEpx/kupit/res/3067fa1f-ab42-410c-8b3a-704ede199661.webp',
    '0WytccuRHqxgIDEYPEpx/kupit/res/3d01a6bf-6acc-4d52-b40f-7b027ea25154.webp',
    // Add 25-50 files here
  ]

  // Setup folder to download files to
  const destinationDirectory = './test-files'
  if (existsSync(destinationDirectory)) { rmSync(destinationDirectory, { recursive: true }) }
  mkdirSync(destinationDirectory)

  // Download
  const bucket = storage.bucket('bucket')
  for (const file of files) {
    const options: DownloadOptions = { destination: destinationDirectory + file.substring(file.lastIndexOf('/')) }
    await bucket.file(file).download(options)
    console.log('Saved ' + file)
  }

  // We get here randomly..
  console.log('Finished')

})()

@swftvsn
Copy link
Author

swftvsn commented Oct 26, 2022

(The SA is custom made for this case too and can only read 5 files (That I got from https://developers.google.com/speed/webp/gallery1) in the bucket that is troublesome. To make it fail always, I just copy & paste the 5 file names over and over again to the array so the likelihood of failure goes up.)

node is v16.18.0
'@google-cloud/storage' is 6.6.0

@swftvsn
Copy link
Author

swftvsn commented Oct 26, 2022

If I download a file the SA has no right to, I get the correct error message. If I kill the network I get an error too (eventually).

@ddelgrosso1
Copy link
Contributor

Have you tried adding a .catch to the end of your IIFE to see if anything is being caught? i.e.

(async () => {
// Code here
})().catch(e) {
// Log e here
}

@swftvsn
Copy link
Author

swftvsn commented Oct 26, 2022

No, it seems to log nothing.

Also, if I do a throw new Error('this is it') inside the async, without the catch, it is still logged to console.

@swftvsn
Copy link
Author

swftvsn commented Oct 26, 2022

In cloud function logs we only have Function execution took 13564 ms. Finished with status: response error.

@swftvsn
Copy link
Author

swftvsn commented Oct 27, 2022

So, @ddelgrosso1 I'm at loss how to debug this, but I do have a failing test case here I can share with you if that is ok? I simply don't want to copy & paste the SA here nor the project id etc. information.

If we need paid support for this, just say it ;)

@ddelgrosso1
Copy link
Contributor

ddelgrosso1 commented Oct 27, 2022

@swftvsn sure, share the test case (without any sensitive information). If you have a support contract, I would suggest opening a case with them. They might be able to investigate more from the server side / configuration side than I can. So far in my environments I have be unable to recreate this.

@swftvsn
Copy link
Author

swftvsn commented Oct 28, 2022

We've opened a ticket in paid side of things, provided this discussion as reference + the failing test case.

@swftvsn
Copy link
Author

swftvsn commented Oct 28, 2022

I can copy & paste the code here too, but can I have some other means of communication for the service account? It's a throw away SA and only with super limited rights, but it still feels really wrong to paste it here.

@ddelgrosso1
Copy link
Contributor

I think the best action with regards to the SA would be to provide it on the paid side as that will be much more secure than trying to communicate it here or via other means (email, etc). If the code is not sensitive that would be fine to place here.

@swftvsn
Copy link
Author

swftvsn commented Oct 28, 2022

So, the whole code is above already, it is missing only the SA and the real bucket name.

@swftvsn
Copy link
Author

swftvsn commented Oct 28, 2022

For completeness sake, this is the whole code.

test-download.ts: (This is the whole file, only filenames array truncated.)

import { existsSync, mkdirSync, readFileSync, rmSync } from 'fs'
import { DownloadOptions, Storage } from '@google-cloud/storage'

(async () => {

  // Initialise Storage
  const serviceAccountBfr = await readFileSync('./google-sa.json')
  const serviceAccountObject = JSON.parse(serviceAccountBfr.toString())
  const storage = new Storage({
    credentials: {
      client_id: serviceAccountObject.client_id,
      client_email: serviceAccountObject.client_email,
      private_key: serviceAccountObject.private_key
    },
    projectId: serviceAccountObject.project_id
  })

  // Declare files to download
  const files = [
    '0WytccuRHqxgIDEYPEpx/kupit/res/3067fa1f-ab42-410c-8b3a-704ede199661.webp',
    '0WytccuRHqxgIDEYPEpx/kupit/res/3d01a6bf-6acc-4d52-b40f-7b027ea25154.webp',
    // Add 25-50 files here
  ]

  // Setup folder to download files to
  const destinationDirectory = './test-files'
  if (existsSync(destinationDirectory)) { rmSync(destinationDirectory, { recursive: true }) }
  mkdirSync(destinationDirectory)

  // Download
  const bucket = storage.bucket('bucket')
  for (const file of files) {
    const options: DownloadOptions = { destination: destinationDirectory + file.substring(file.lastIndexOf('/')) }
    await bucket.file(file).download(options)
    console.log('Saved ' + file)
  }

  // We get here randomly..
  console.log('Finished')

})()

@ddelgrosso1
Copy link
Contributor

Hi @swftvsn so far myself and another member of the team have had no luck in reproducing this issue. I know you mentioned that you had permissions fairly restrictive on the files. Have you tried with less restrictive permissions and if so do you get the same result?

@ddelgrosso1
Copy link
Contributor

@swftvsn since we have not been able to reproduce this issue and have not had any similar issues raised by others, I am going to close this out. If there is additional information that might help us debug further please feel free to reopen.

@swftvsn
Copy link
Author

swftvsn commented Nov 17, 2022

@ddelgrosso1 Trying with raw nodejs https client it seems the underlying reason is a ton of ECONNRESET errors.

The symptom is that the download has started, so it looks like this is #1822 maybe?

@swftvsn swftvsn changed the title Storage client randomly dies with exit(0) while downloading a file Storage client randomly dies with while downloading a file Nov 17, 2022
@swftvsn
Copy link
Author

swftvsn commented Nov 17, 2022

One thing to investigate is why we're having so much econnresets both inside google and outside google.

Still, how can I make sure that the retries work? I've inspected some of the code, and it seems it might not be trivial to swap the underlying client to another.

I think the only option for now is to start using our own implementation to download from storage, which is far from optimal.

@ddelgrosso1
Copy link
Contributor

@swftvsn if you are getting interrupted downloads (ECONNRESETS) on files that are 5kb to 500kb even using the raw HTTP client, I would look at your network configuration and see if something might forcibly be closing these connections thereby resulting in ECONNRESETS.

I do not think this would be related to #1822 as the symptoms we saw in those circumstances was a long period of inactivity before timing out and not an application crash.

@swftvsn swftvsn changed the title Storage client randomly dies with while downloading a file Storage client randomly dies while downloading a file Nov 19, 2022
@swftvsn
Copy link
Author

swftvsn commented Nov 19, 2022

I do not think this would be related to #1822 as the symptoms we saw in those circumstances was a long period of inactivity before timing out and not an application crash.

Trying to elaborate a bit more: so, are you saying that the underlying problem identified in #1822 can't, in the event of ECONNRESETS, swallow the error when the download has started? (We're observing exactly that: download starts, some bytes are read, econnreset happens -> exception is swallowed and download dies silently and no retry is done regardless retry settings according to network packet sniffer.)

The client should throw the error in all cases.

@swftvsn
Copy link
Author

swftvsn commented Nov 19, 2022

@swftvsn if you are getting interrupted downloads (ECONNRESETS) on files that are 5kb to 500kb even using the raw HTTP client, I would look at your network configuration and see if something might forcibly be closing these connections thereby resulting in ECONNRESETS.

We're trying to solve the source reason for the econnresets with the paid cloud support. We're running totally unmodified Cloud Function version 1 on the latest supported NodeJS by that stack, and we don't modify the network settings in any way in our javascript code.

We don't use any private networks etc. so this is pure default Google Cloud. I really don't know what network settings we could (or should) modify in this case, but again, the source reason why network errors happen is not relevant to this library, the fact that some errors seem to be swallowed and the retries not working is.

@surjikal
Copy link
Contributor

@swftvsn were you able to figure this out? We're experiencing pretty much the exact same thing. Seeing a bunch of ECONNRESET and then sudden death. It's almost like the event loop dries up after retry.

@surjikal
Copy link
Contributor

@ddelgrosso1 sorry about the ping on this old issue.
I am here because I am currently experiencing this issue.

The lack of catch here looks sus to me so I wanted to call it out just in case:
https://github.com/googleapis/nodejs-storage/blame/main/src/file.ts#L1551

@ddelgrosso1
Copy link
Contributor

Hi @surjikal no worries pinging here but if you are seeing the same issue / similar issue can you possibly open a new bug? If you could include the environment details, i.e. gke, gce that would be helpful. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants