Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: MPU for transfer manager #2192

Merged
merged 7 commits into from
Jul 12, 2023
Merged

Conversation

ddelgrosso1
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@ddelgrosso1 ddelgrosso1 requested review from a team as code owners May 4, 2023 19:55
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: storage Issues related to the googleapis/nodejs-storage API. labels May 4, 2023
@ddelgrosso1 ddelgrosso1 added the owlbot:run Add this label to trigger the Owlbot post processor. label May 4, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 4, 2023
Copy link
Contributor

@danielduhh danielduhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting! Just had some initial thoughts/questions...

});
for await (const curChunk of readStream) {
if (promises.length >= maxQueueSize) {
await Promise.all(promises);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do? Execute all the parts? Should this get returned to the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to the changes I made for deleteFiles. p-limit only constrains the number of running promises, not the number of promises being held in memory. I added a queue here such that if a user is uploading a large file we aren't attempting to hold all those chunks in memory at once. By default we will only hold the same number of chunks in memory as the number of executing promises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Maybe add a comment to make it more clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to explain the logic.

);
}
await Promise.all(promises);
await mpuHelper.completeUpload();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the return statement here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return is of type Promise<void> in the success case there is no return value to the user. If something goes wrong we throw an exception with the error, uploadId, and mapping of completed parts that were uploaded. This allows the user to call again supplying the uploadId and parts so that we may resume from where the failure occurred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Is there no value in returning the response from complete upload?

Copy link
Contributor Author

@ddelgrosso1 ddelgrosso1 May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it to return the full response just to be inline with the raw API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now returns Promise<GaxiosResponse | undefined>. The GaxiosResponse is the full API response.

* @private
* @experimental
*/
class XMLMultiPartUploadHelper implements MultiPartUploadHelper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this go in a separate file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move this to a separate file the only problem is I will have to export it which means a user can potentially come along and import it and use it directly. I was attempting to avoid that situation. Depends what people think and whether or not we are ok with saying "don't use this directly".

*
* @returns {Promise<TestResult>} A promise that resolves containing information about the test results.
*/
async function performDownloadFileInChunksTest(): Promise<TestResult> {
async function performChunkUploadDownloadTest(): Promise<TestResult[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the reasoning for combining these? Simplicity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workload 8 is supposed to be upload and download test so I combined them.

import {CRC32C} from './crc32c';
import {GoogleAuth} from 'google-auth-library';
import {XMLParser, XMLBuilder} from 'fast-xml-parser';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to package JSON? It would be interesting to see the level of effort for adding this to one of our shared libs (gaxios?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it: https://github.com/googleapis/nodejs-storage/pull/2192/files#diff-7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519R62. I'm sure we could write something ourselves. I figured since this was a one off usage of the XML API it probably wasn't worth doing ourselves.

Copy link
Contributor

@danielduhh danielduhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. One question on returning the response of complete upload

});
for await (const curChunk of readStream) {
if (promises.length >= maxQueueSize) {
await Promise.all(promises);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Maybe add a comment to make it more clear?

);
}
await Promise.all(promises);
await mpuHelper.completeUpload();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Is there no value in returning the response from complete upload?

@ddelgrosso1 ddelgrosso1 added the owlbot:run Add this label to trigger the Owlbot post processor. label May 5, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 5, 2023
@ddelgrosso1 ddelgrosso1 requested review from andrewsg and tritone May 5, 2023 21:36
@ddelgrosso1 ddelgrosso1 added the owlbot:run Add this label to trigger the Owlbot post processor. label May 5, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 5, 2023
@ddelgrosso1 ddelgrosso1 added the owlbot:run Add this label to trigger the Owlbot post processor. label May 16, 2023
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label May 16, 2023
@ddelgrosso1 ddelgrosso1 added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label May 24, 2023
@ddelgrosso1 ddelgrosso1 removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 10, 2023
@ddelgrosso1 ddelgrosso1 merged commit ae83421 into googleapis:main Jul 12, 2023
@ddelgrosso1 ddelgrosso1 deleted the tm-chunk-up branch July 12, 2023 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants