This repository has been archived by the owner on Nov 8, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request captures an experiment with using parallel multipart upload and Google Cloud Storage Composite Objects.
This change set includes a new client,
MultipartUploader
that encapsulates the splitting of an incoming stream, uploading them in parallel, and finally issuing a compose request to combine them at close.Some observations:
byte[]
rather than anInputStream
, and it does so for an interesting reason. Thebyte[]
variant automatically can retry on some failure cases and is recommended, the InputStream variant does not, since the stream can only be used once.chunkSize
to allow deployers to customize based on workloads (some repository formats have larger assets than others). If the chunkSize is too low, you could easily run into a circumstance with 31 chunks of a small size and 1 giant chunk that we have to wait for.Does it perform better or worse than single thread? I setup a single NXRM instance with the plugin on a n1-highmem-4 GCP instance, and used another GCP instance in the same VPC to generate a diverse workload. I ran the same workload with the current master build (using single threaded uploads).
The results? Higher CPU utilization and lower overall network I/O:
I don't believe this to be suitable approach for NXRM blobstores at this time.
This was a valuable experiment. I will be porting the increase to the connections per route configuration and some of the integration tests to the final product. I do not intend to merge this change set at this time; opening this PR just to capture the results. Relates to #1.