Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give users the ability to disable gzip content encoding to increase throughput #3822

Closed
bmenasha opened this issue Oct 16, 2018 · 9 comments
Closed
Assignees
Labels
api: storage Issues related to the Cloud Storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@bmenasha
Copy link

  • OS: all
  • Java version: 1.8 (probably all)
  • google-cloud-java version(s): 1.23.0

Steps to reproduce

  1. Upload to GCS using the cloud storage library.
  2. Observe poor throughput (20 MiB/s)_ and that lots of time is spent compressing the content.

On GCE I'm able to achieve 70 MiB/s if I disable gzip encoding when uploading to GCS. With gzip encoding it's much worse at 20 MiB/s. However disabling gzip encoding isn't an option exposed in the interface. This makes google-cloud-java poor in comparison to the REST api it's wrapping.

This is a request to expose the gzip encoding of the stream as a storage option. See code in com.google.cloud.storage.spi.v1.HttpStorageRpc.create there is no code to call StorageRequest.setDisableGZipContent (which defaults to false) on the underling com.google.api.services.storage.StorageRequest:

  Storage.Objects.Insert insert = storage.objects()
      .insert(storageObject.getBucket(), storageObject,
          new InputStreamContent(storageObject.getContentType(), content));
  insert.getMediaHttpUploader().setDirectUploadEnabled(true);
  setEncryptionHeaders(insert.getRequestHeaders(), ENCRYPTION_KEY_PREFIX, options);
  return insert.
  ... .execute();

Can we please provide this feature to maintain comparable performance with the com.google.api.services.storage.* client?

thanks

Code snippet

Uploading an object to GCS and observe performance, then do so with gzip disabled:

    /// USING THE OLD CLIENT
    Storage client = getService();
    Storage.Objects.Insert insertRequest =
            client.objects().insert(bucketName, objectMetadata, contentStream);

    /// Currently NO ability to this in google-cloud-java, need this for much faster throughput on GCE.
    insertRequest.setDisableGZipContent(true);
    insertRequest.getMediaHttpUploader().setDisableGZipContent(true);

    insertRequest.getMediaHttpUploader().setDirectUploadEnabled(true);
    long start = System.currentTimeMillis();
    insertRequest.execute();
    long end = System.currentTimeMillis();
    System.out.println( String.format("%.02f MiB/s ",(length / (1024*1024.0) )/ ((end - start) / 1000.0)));

Any additional information below

Also, perhaps document somewhere that in order to get any reasonable performance on Java8 it's necessary to disable Galios/Counter mode in java.security: https://stackoverflow.com/questions/25992131/slow-aes-gcm-encryption-and-decryption-with-java-8u20

@JustinBeckwith JustinBeckwith added the triage me I really want to be triaged. label Oct 16, 2018
@chingor13 chingor13 added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: storage Issues related to the Cloud Storage API. labels Oct 18, 2018
@JustinBeckwith JustinBeckwith removed the triage me I really want to be triaged. label Oct 18, 2018
@ajaaym
Copy link
Contributor

ajaaym commented Oct 24, 2018

I see that Storage.Objects.Insert extends StorageRequest which extends AbstractGoogleJsonClientRequest which in turn extends AbstractGoogleClientRequest.

setDisableGZipContent is public setter exists on AbstractGoogleClientRequest, so you should be able to call

insertRequest.setDisableGZipContent(true);

directly

@bmenasha
Copy link
Author

Thanks, perhaps there is some miscommunication, i'm complaining about the lack of this feature in this project, google-cloud-java/google-cloud-storage not the library it's wrapping google-api-java-client-services.
thanks

@ajaaym
Copy link
Contributor

ajaaym commented Oct 30, 2018

@bmenasha Thanks for clarification. Using BlobWriteChannel will allow you to send data without compressing.

 String bucketName = "my_unique_bucket";
 String blobName = "my_blob_name";
 BlobId blobId = BlobId.of(bucketName, blobName);
 byte[] content = "Hello, World!".getBytes(UTF_8);
 BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("text/plain").build();
 try (WriteChannel writer = storage.writer(blobInfo)) {
   try {
     writer.write(ByteBuffer.wrap(content, 0, content.length));
   } catch (Exception ex) {
     // handle exception
   }
 }

@PadrePadrone
Copy link

Hi,

We need this feature as well.

  1. What is the difference between using writer.write(...) and using storage.create(...)?
  2. Will writer create the blob if it doesn't exist?
  3. Will writer work the same if I don't provide a 'content type'? how about if I provide a different content type, such as "application/octet-stream"?

@ajaaym
Copy link
Contributor

ajaaym commented Oct 31, 2018

Please find my answer below.

  1. Writer gives better control for large blob transfer. Storage.create() with stream is deprecated as it can not be retried, as explained here and here
  2. yes it will create if blob doesnt exists.
  3. If content type is not provided it takes "application/octet-stream" by default.

@PadrePadrone
Copy link

Hi @ajaaym

Following your recommendation, I replaced the use of storage.create:
Blob blob = clientSetup.storage.create(blobInfo, data.getData());

With using a writer channel:
writeChannel = clientSetup.storage.writer(blobInfo);
writeChannel.setChunkSize(data.getDataSize());
writeChannel.write(ByteBuffer.wrap(data.getData()));

Performance became much worst with the new code.

Is this the expected behaviour?
Is there any additional code I need to write when using the writer channel?

@mcantrell
Copy link

This is a serious problem with the new storage client. The internal method in HttpStorageRpc does not attempt any configuration on the mediaHttpUploader.

https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-clients/google-cloud-storage/src/main/java/com/google/cloud/storage/spi/v1/HttpStorageRpc.java#L271

This can severely impair performance for files which are not suitable for gzip compression. For instance, the following test ran with a 1GB binary file produces very different results due to high CPU overhead without significant size reduction:

@Test
public void uploadWithoutGzip() throws Exception {
    executeUpload(true);
}

@Test
public void uploadWithGzip() throws Exception {
    executeUpload(false);
}

private void executeUpload(Boolean disableGzip) throws IOException {
    assertTrue(UPLOAD_FILE + " does not exist", UPLOAD_FILE.exists());
    Storage storage = buildStorage();
    try (InputStream uploadStream = new FileInputStream(UPLOAD_FILE)) {
        Stopwatch stopwatch = Stopwatch.createStarted();
        InputStreamContent content = new InputStreamContent(null, uploadStream);
        StorageRequest<StorageObject> insert = storage.objects().insert(BUCKET_NAME, null, content)
                .setName(BUCKET_PATH)
                .setDisableGZipContent(disableGzip);
        insert.getMediaHttpUploader().setDirectUploadEnabled(true);
        insert.execute();

        long elapsedSeconds = stopwatch.stop().elapsed(TimeUnit.SECONDS);
        double fileSizeMb = (double)UPLOAD_FILE.length() / 1024 / 1024;
        double throughput = fileSizeMb / elapsedSeconds;
        log.info("Completed upload: elapsed={}s,fileSize={}MB,throughput={}MB/s",
                elapsedSeconds, fileSizeMb, throughput);
        assertTrue("Expected at least 20MB/s", throughput > 20);
    }
}

Without gzip: ~35MB/s
With gzip: ~16.MB/s

I'd think that this deserves a higher priority. We cannot upgrade to the new client library until this is fixed.

@mcantrell
Copy link

mcantrell commented Nov 14, 2018

@PadrePadrone

I think the write channel performance is similar to issue #3929. The write operation to the API is tied to the chunk size:

https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-clients/google-cloud-storage/src/main/java/com/google/cloud/storage/BlobWriteChannel.java#L45

Unless you want to use large buffers, the performance will likely be slower.

Edit: never mind... I see that you're adjusting the chunk size to the size of the data. I missed that.

@mcantrell
Copy link

mcantrell commented Nov 14, 2018

Attaching a test case which demonstrates multiple upload mechanisms:

  • storage.create: FAIL (~15MB/s)
  • storage.writer: FAIL (~15MB/s)
  • internal storage client (gzip enabled): FAIL (~15MB/s)
  • internal storage client (gzip disabled): PASS (~35MB/S)

See StoragePerformanceUploadTest in storage-performance-new.zip

Edit: test file available here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

7 participants