-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multipart Upload with GCS .NET SDK #13021
Comments
Showing the S3 code doesn't really help us know what your GCS code looks like. Please provide a minimal example using Google.Cloud.Storage.V1. |
@jskeet using System.Buffers;
using System.Globalization;
using CsvHelper;
using Google.Cloud.Storage.V1;
using Report.Generator.Infra.Repositories;
namespace Report.Generator.Infra.Generators
{
public class CsvGCSReportGenerator
{
private readonly StorageClient _storageClient;
public CsvGCSReportGenerator(StorageClient storageClient)
{
_storageClient = storageClient;
}
public async Task GenerateAsync(string bucketName, string fileName, int blockSize)
{
byte[] buffer = ArrayPool<byte>.Shared.Rent(blockSize);
int bufferPosition = 0;
using var memoryStream = new MemoryStream();
using var writer = new StreamWriter(memoryStream);
using var csvWriter = new CsvWriter(writer, CultureInfo.InvariantCulture);
int partNumber = 1;
await foreach (var product in ProductRepository.FetchProductsAsync())
{
memoryStream.SetLength(0);
csvWriter.WriteRecord(product);
await csvWriter.NextRecordAsync();
await writer.FlushAsync();
memoryStream.Position = 0;
while (memoryStream.Position < memoryStream.Length)
{
int bytesToRead = Math.Min(blockSize - bufferPosition, (int)(memoryStream.Length - memoryStream.Position));
int bytesRead = await memoryStream.ReadAsync(buffer, bufferPosition, bytesToRead);
bufferPosition += bytesRead;
if (bufferPosition == blockSize)
{
await UploadPartAsync(buffer, bufferPosition, bucketName, fileName, partNumber++);
bufferPosition = 0;
}
}
}
if (bufferPosition > 0)
{
await UploadPartAsync(buffer, bufferPosition, bucketName, fileName, partNumber);
}
ArrayPool<byte>.Shared.Return(buffer);
}
private async Task UploadPartAsync(byte[] buffer, int bufferLength, string bucketName, string fileName, int partNumber)
{
using var partStream = new MemoryStream(buffer, 0, bufferLength);
await _storageClient.UploadObjectAsync(bucketName, $"{fileName}.csv", "text/csv", partStream);
Console.WriteLine($"Uploaded part {partNumber}");
}
}
} Previous attempt with Resumable Upload: using System.Globalization;
using System.IO;
using System.Net.Mime;
using CsvHelper;
using Google.Apis.Upload;
using Google.Cloud.Storage.V1;
using Report.Generator.Domain.Entities;
using Report.Generator.Infra.Repositories;
namespace Report.Generator;
public class Program
{
public async static Task Main()
{
Console.WriteLine($"Started at {DateTime.Now}");
using var memoryStream = new MemoryStream();
using var writer = new StreamWriter(memoryStream);
using var csvWriter = new CsvWriter(writer, CultureInfo.InvariantCulture);
csvWriter.WriteHeader<Product>();
await csvWriter.NextRecordAsync();
var client = await StorageClient.CreateAsync();
var options = new UploadObjectOptions
{
ChunkSize = UploadObjectOptions.MinimumChunkSize
};
var uploadUri = await client.InitiateUploadSessionAsync(Environment.GetEnvironmentVariable("BUCKET_NAME"), "report.csv", "text/csv", contentLength: null, options);
int batchSize = 100_000;
await foreach (var product in ProductRepository.FetchUnbufferedProductsAsync(batchSize))
{
csvWriter.WriteRecord(product);
csvWriter.NextRecord();
Console.WriteLine(product.Title);
}
await writer.FlushAsync();
memoryStream.Position = 0;
IProgress<IUploadProgress> progress = new Progress<IUploadProgress>(
p => Console.WriteLine($"bytes: {p.BytesSent}, status: {p.Status}")
);
var actualUploader = ResumableUpload.CreateFromUploadUri(uploadUri, memoryStream);
actualUploader.ChunkSize = UploadObjectOptions.MinimumChunkSize * 2;
actualUploader.ProgressChanged += progress.Report;
await actualUploader.UploadAsync();
Console.WriteLine($"Finished at {DateTime.Now}");
}
} |
No, you can't - at least not with our libraries. I'm afraid it's a use-case we just don't support at the moment. Assuming I've understood you correctly, this is basically equivalent to this issue. I think it's unlikely that we'll support this any time soon. What you could do is upload each part to a separate object, and then use the Compose operation (from StorageService; it's not exposed directly in StorageClient). to create a single object after the fact. I'll reassign this to a member of the Storage team in case you have any further questions. |
Alright. Thanks a lot for feedback |
@JesseLovelace, could you help with this feature? Maybe work together to address a solution. |
My team will take a look and evaluate when we can get this done |
Hey guys, I recently developed a proof of concept about on-demand csv files generation.
The goal is to retrieve data from a relational database, map to csv and then upload it to bucket on cloud given a chunksize (eg 5MB).
I've read the docs and tried to use Resumable Upload feature, but my file gets overwritten.
A "complete" method would be very useful...but i didn't find anything about it. Could you help out here?
Successful sample code with AWS provider:
PS: I've seen GCS does have a XML multipart upload API.
The text was updated successfully, but these errors were encountered: