Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume an incomplete upload #2162

Closed
MarkMurphy opened this issue Mar 13, 2023 · 8 comments · Fixed by #2333
Closed

Resume an incomplete upload #2162

MarkMurphy opened this issue Mar 13, 2023 · 8 comments · Fixed by #2333
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@MarkMurphy
Copy link

I have a node server/api that sits in front of cloud storage. It authenticates the user and requires some additional metadata before streaming the upload into cloud storage.

I'm trying abstract away and hide the fact that we're using GCS from our clients/consumers. I'm therefore building an api that mirrors the resumable upload api and that will allow clients to chunk/resume an interrupted file upload.

Please correct me if I'm wrong but this client library doesn't seem to accommodate this use case.

A client first sends a post request to my api to initiate a resumable upload e.g. POST /files

The server then starts a resumable upload and places an object in the bucket. We store the session URI in our database for later retrieval. We then respond with the an upload id that we generate.

A client may then resume an upload via PATCH /files/{id} with e.g. Content-Range: bytes 456-987/1234 but there's no way for me to resume an already started resumable upload using the information in that header given how the storage library is currently written.

Some example code:

// Resume an interrupted/incomplete upload
server.patch('/files/:id', async (req, res) => {
  const headers = req.headers;
  const contentLength = parseContentLength(headers['content-length']);
  const contentRange = parseContentRange(headers['content-range']);
  const rangeStart = contentRange.rangeStart ?? 0;

  // Nowhere to pass these that I can tell
  // const rangeEnd = contentRange.rangeEnd;
  // const totalSize = contentRange.size;

  const id = req.params.id;
  const record = await db.files.get(id);

  if (!record) {
    throw { statusCode: 404, message: `No such file: ${id}` };
  }

  if (record.uploadSessionURIExpiresAt <= Date.now()) {
    throw { statusCode: 410, message: `No such file: ${id}` };
  }

  const gcsObjectPath = `${gcsObjectPrefix}/${id}`;
  const object = storage.bucket(gcsBucket).file(gcsObjectPath);

  const uploadStream = object.createWriteStream({
    resumable: true,
    // no way for me to say that offset is the next byte after what we expect has already been uploaded
    // no way for me to tell the library the totalSize if it is now known.
    offset: rangeStart,
    uri: record.uploadSessonURI,
  });

  await pipe(req.raw, uploadStream);
  
  // Produces the following error:
  //
  //   The uploaded data did not match the data from the server.
  //   As a precaution, the file has been deleted.
  //   To be sure the content is the same, you should try uploading the file again.

  res.status(200).send({ ok: true });
});

Looking at the internals it seems like the behaviour is mainly determined by startUploading in resumable-uploads.ts

@MarkMurphy MarkMurphy added priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue. labels Mar 13, 2023
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Mar 13, 2023
@danielbankhead danielbankhead self-assigned this Mar 14, 2023
@danielbankhead
Copy link
Contributor

Hey @MarkMurphy, which version of the library are you running?

I'm trying abstract away and hide the fact that we're using GCS from our clients/consumers. I'm therefore building an api that mirrors the resumable upload api and that will allow clients to chunk/resume an interrupted file upload.

Please correct me if I'm wrong but this client library doesn't seem to accommodate this use case.

We support this use case.

@MarkMurphy
Copy link
Author

MarkMurphy commented Mar 14, 2023

@danielbankhead

which version of the library are you running?

v6.9.4

We support this use case.

Ok great, happy to provide any information you need to figure this out. I'm able run a debugger during a request so I can inspect anything you want.

There's a couple of places in the library code that seem like they may be relevant here. All are in the body of the startUploading method.

if (this.numBytesWritten < this.offset) {
// 'fast-forward' to the byte where we need to upload.
// only push data from the byte after the one we left off on
const fastForwardBytes = this.offset - this.numBytesWritten;
for await (const _chunk of this.upstreamIterator(fastForwardBytes)) {
_chunk; // discard the data up until the point we want
}
this.numBytesWritten = this.offset;
}

this.numBytesWritten is always initialized as 0 (aside: it's set to 0 twice in the constructor). When an upload is being resumed (some data has already been persisted in a previous client request), if I pass an offset (i.e. range start) for the next byte of data, this conditional will be truthy because this.numBytesWritten is 0 and hasn't accounted for the data already written to storage in a previous request.

I think this block of logic is assuming a different scenario than the one we're in. Like needing to resend a chunk of data that failed to persist between the server and GCS for example. This ends up skipping over data the client sent by offset bytes.

Maybe I need to use multiChunkMode but that code looks like it has issues too, it's going to be affected by the "fast-forwarded" this.numBytesWritten value I described above.

Here's an example repo with the basic code that demonstrates my current situation:
https://github.com/MarkMurphy/example-files-api

@danielbankhead
Copy link
Contributor

Hey @MarkMurphy, I believe the culprit to the error is the validation option in createWriteStream. Since the stream is resuming, the server already has a CRC32C value that the HashStreamValidator is unaware of during the stream upload:

nodejs-storage/src/file.ts

Lines 1876 to 1885 in 0501127

let crc32c = true;
let md5 = false;
if (typeof options.validation === 'string') {
options.validation = options.validation.toLowerCase();
crc32c = options.validation === 'crc32c';
md5 = options.validation === 'md5';
} else if (options.validation === false) {
crc32c = false;
}

nodejs-storage/src/file.ts

Lines 1913 to 1918 in 0501127

const hashCalculatingStream = new HashStreamValidator({
crc32c,
md5,
crc32cGenerator: this.crc32cGenerator,
updateHashesOnly: true,
});

I will look into this further in the coming days (likely tomorrow). In the meantime, feel free to try setting {validation: false} in the options.

@MarkMurphy
Copy link
Author

MarkMurphy commented Mar 14, 2023

@danielbankhead Thanks, passing validation: false allows the request to succeed but it doesn't write the additional data to the storage object because of the logic above I mentioned that discards the data in the request up to whatever value offset bytes was. This essentially creates a 0 content-length request because all the data has been discarded.

@danielbankhead
Copy link
Contributor

@MarkMurphy My mistake - I misunderstood the original issue. I'll work on adding this functionality shortly.

@danielbankhead danielbankhead added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: question Request for information or clarification. Not an issue. labels Apr 10, 2023
@jthomasprogrammer
Copy link

Hi @danielbankhead, is there a update in regards to this request? I ended up running into the same thing as @MarkMurphy

@danielbankhead
Copy link
Contributor

Hey @jthomasprogrammer, I’m planning to add this functionality this quarter or early next quarter

@jthomasprogrammer
Copy link

Gotcha, thanks @danielbankhead!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants