-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: resumable uploads #298
Comments
What strategy will be taken if writing to disk is not possible due to |
I guess fallback to non-resumable behavior. What would make the most sense? On Monday, November 10, 2014, Ryan Seys [email protected] wrote:
|
Exposing the id and other necessary information used to me as a On Monday, November 10, 2014, Stephen Sawchuk [email protected]
|
The boto approach might be interesting: the library will automatically retry x times on failures; a tracker file is optional and can be used to save/read the state. |
That boto approach makes sense. We should be able to do the same thing. If a tracker file is provided, we first read it to see if there's already data to use to resume the upload. If not, we store the state of the upload in it and are able to re-use it the next time through.
|
If the option is passed the file is written, and it's then removed if the upload is successful or if any non-retryable exception is raised.
Looks like is just a line in the tracker file that contains the URI. |
SGTM 👍 |
After thinking more on this, I think it might be beneficial to still attempt to handle resumable uploads automatically, without a tracker file having to be provided: fs.createReadStream("./photos.zip")
.pipe(file.createWriteStream())
.on("error", callback)
bucket.upload("./photos.zip", callback)
Handling the streaming upload should be possible by counting bytes: https://cloud.google.com/storage/docs/concepts-techniques#unknownresumables ** edit: can't read. JSON docs: https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload#resumable ** |
Yeah, as mentioned the tracker file is optional.
How is this different from using a tracker file? Can't we just store the ID in memory and on failure emit it? |
By defaulting to "we'll store it in a file for you" but allowing "give us a file to store it in", I think that would create a confusing situation where it's not clear when it makes sense to provide your own file. The file is useless outside of gcloud-node, as far as I know. After playing with this today, I can't even think of a good reason to return the resumable ID to the user -- with it alone, it's not very handy. If they don't use it with gcloud-node, they need to somehow assemble their own request (authenticate the headers, read the last byte sent, discard unused bytes from the retried upload data stream, etc). I'd rather just handle everything, and emit an error as we currently do - if the upload fails for any reason. The user will try again, and we'll automatically pick up where we left off, without them having to think about it. The configstore is preferred because the user won't even need to know there is a config file. |
SGTM. |
BREAKING CHANGE: The library now supports Node.js v10+. The last version to support Node.js v8 is tagged legacy-8 on NPM. New feature: methods with pagination now support async iteration.
* chore: update codeowners * 🦉 Updates from OwlBot See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Benjamin E. Coe <[email protected]>
🤖 I have created a release *beep* *boop* --- ## [3.2.0](https://togithub.com/googleapis/nodejs-bigquery-storage/compare/v3.1.1...v3.2.0) (2022-11-11) ### Features * Add location to WriteStream and add WriteStreamView support ([#295](https://togithub.com/googleapis/nodejs-bigquery-storage/issues/295)) ([ba3c5ef](https://togithub.com/googleapis/nodejs-bigquery-storage/commit/ba3c5ef05366b1e9a542b9b13fc0c7a25118b2a3)) --- This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
🤖 I have created a release \*beep\* \*boop\* --- ### [2.1.5](https://www.github.com/googleapis/nodejs-recaptcha-enterprise/compare/v2.1.4...v2.1.5) (2021-07-12) ### Bug Fixes * **deps:** google-gax v2.17.1 ([#297](https://www.github.com/googleapis/nodejs-recaptcha-enterprise/issues/297)) ([15640f1](https://www.github.com/googleapis/nodejs-recaptcha-enterprise/commit/15640f1a4a931caa9ece4b749994e7d6d609200e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
This PR was generated using Autosynth. 🌈 Synth log will be available here: https://source.cloud.google.com/results/invocations/cc99acfa-05b8-434b-9500-2f6faf2eaa02/targets - [ ] To automatically regenerate this PR, check this box. Source-Link: googleapis/synthtool@799d8e6
Big files take big time. Big time means big chances for failure. Currently, if something goes wrong during an upload at 95%, the user is forced to re-upload starting from 0%. Using the resumable upload capabilities of the storage API, we should be able to handle this for our users.
It's not exactly trivial, however. My current idea for a solution involves storing a configuration file on the user/server's drive, where we can store tokens/state of uploads. Yeoman happens to have a tool to enable this for us, https://github.com/yeoman/configstore
Unless there are any objections, or more ideas for solutions, I will try to have a PR this week with this functionality.
The text was updated successfully, but these errors were encountered: