Improve throughput by 50+% via parallel uploads and disabling gzip compression #53

nblair · 2019-12-13T21:42:03Z

This pull request is a byproduct of my recent learning on how to use Google Cloud's Stackdriver Profiler.

I setup a NXRM 3.19.1-01 instance with version 0.9.2 of this plugin, and fired up a stress test (internal tool) to see what I could see in Profiler.

One interesting stack immediately stood out and surprised me:

Drilling deeper into that wide column:

Why is the Google Cloud Storage client compressing content on the way to the bucket?

The content in NXRM workloads predominantly already compressed (jars, zips, tar.gz, docker images, etc). Adding an additional layer of compression would have little benefit, as it wouldn't reduce the content size by much.

Furthermore, the width of these columns in the profiler indicated a significant amount of time was spent on CPU for writes, which isn't ideal.

A little more research and I came across this issue in the google-cloud-java project:

googleapis/google-cloud-java#3822

The comments related to performance issues due to gzip compression being enabled by default immediately clicked.

To date, the released versions of this plugin have used the InputStream variants of storage write methods.

The feature described in #3822 above are not available on the InputStream variants (which are deprecated) - only on the byte[] variants.

This led me to think about #22 which leveraged the byte[] variants.

I revived the multipart-upload branch, brought it up to master, and disabled gzip compression. I found right away that this had a positive impact:

Stackdriver Profiler shows here that a) the stack to upload content was shorter (gzip compression gone), and b) the stack was narrower, using overall less CPU time.

The net effect from our internal stress test is a 51% improvement in throughput across the board:

Wait - this PR includes two major changes: disabling gzip compression AND using parallel threads to write to the bucket. Aren't you skipping a step?

Yes. I have another branch available - https://github.com/sonatype-nexus-community/nexus-blobstore-google-cloud/tree/single-thread-buffered-upload. In that branch, I skipped the thread pool and uploaded the chunks on the main request thread, using gzip compression disabled. While the performance was better than the baseline (about 25%), it still did not match the gains by the parallel version. I'll keep that branch around as an alternative if we need to explore it further.

This pull request is fairly large and makes the following changes:

Breaks out the write to Google Cloud Storage function into a new MultipartUploader class
Writes larger than the nexus.gcs.multipartupload.chunksize (default is 5MB) will trigger parallel uploads of the content. These uploads are performed in a dedicated thread pool with a size limited to the number of available processors.
Improves upon Experiment: Parallel Multipart Upload #22 by disabling GZIP compression.
If an upload hits the compose request limit of 32 chunks, the remaining chunk will still suffer gzip compression. This is because we have to fall back to the InputStream variant for writing to the bucket. I will look to propose an improvement to the google-cloud-storage library but we can still proceed here.

It relates to the following issue #s:

Fixes Identify source of poor throughput #1 (🎉 🎉)

Previously, a MPU would succeed in writing all the chunks and composing into the desired destination. The chunkNames argument however included the final destination name (as it is used for the "first" chunk); when deferredCleanup would finally complete the destination file would also be deleted. This corrects the behavior by excluding the first chunk name in deferredCleanup via a filter on the argument. This also includes a fix for a possible OutOfMemoryError condition being raised with blobs that have a really large final chunk. We avoid the issue by avoiding a call to readChunk (and a Arrays.copyOfRange on the chunk) for the final chunk. Instead, we pass the remaining InputStream to the storage client, with the unfortunate side effect of calling a deprecated method. Per the docs, the storage#create method is deprecated because it is not retryable. That means we may have failed uploads where the first 31 chunks succeed, and the last fails without retry; that highlights the importance of the tuning option and the log messaging alerting of the issue. Also includes: * an integration test to mimic storage facet's write temp and move behavior * new tests on the uploader with larger blobs * increases the maxConnectionsPerRoute to equal maxTotalConnections

…y/nexus-blobstore-google-cloud into multipart-upload

Recent profiling sessions discovered that the google-cloud-storage java library gzip compresses streams on write to the bucket. Since the overwhelming majority of content we write through this plugin is already compressed, this has no benefit and results in high CPU utilization and low network throughput. By disabling gzip compression on the fixed-size chunk writes, we observe a 30-50% increase in network throughput. The edge case that this commit does not address is the final chunk; since we don't have the size and we can't chunk it, we have to call the deprecated storage write method that takes an InputStream. This variant has no option available to disable gzip compression.