-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: logging: experimental batching writer #2910
Conversation
Manually written batching implementation. Unlike GAX batching, it implements flushing and does not deal with partition keys. If we're OK with where this is heading, we should make this work with batching settings (shouldn't be hard) and load-test before migrating things to it. If not, this can serve as a starting point for a better GAX batching implementation. In an experiment, I publish 1M messages of 300 bytes each. Using LoggingHandler gives me ~14,000 msg/s. BatchingWriter gives ~67,000, using similar BatchingSettings. Letting users configure these settings might be important though; I have observed ~300K msg/s given enough CPU and memory.
Updates #2796 |
// Whoever calls send serializes the proto; so we do it off-thread. | ||
// This gives better CPU utilization if there are few producer threads | ||
// on a many-core machine. | ||
executor.execute( |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
I still think that we should try refactoring gax code instead of writing batching code from scratch. There are a lot of useful features in the gax implementation that are lost here. Before the bigtable client is ready I will need the following features in batching:
General feedback on the PR:
|
@igorbernstein2 Thank you for your thoughts. I agree that putting this in gax will be valuable. I'm mostly using this to figure out what logging needs from whatever ends up in gax. This is why it's an RFC after all. To answer your points:
I think this is easily done by changing Semaphore to FlowController right?
Please correct me if I'm wrong. I think these 3 are somewhat related. At least it's predicated on the ability to identify individual requests from a batch. Pubsub needs this too. However, logging doesn't (writes are all or nothing, and failures are reported to error manager). I haven't measured how much CPU work this would cost. If it's significant, we should find a way for logging to not pay for this cost.
I don't think this makes sense to logging. Doc explicitly discourages throwing exceptions.
Definitely makes sense as a convenience method. I believe it can be implemented by initFlush();
for (Future<?> future : pendingRpcs()) future.get(); // error handling elided.
Does BigTable require single- or multiple-producer per writer? I think isReady makes sense in single, but could cause thundering herd problem in multiple. I need to think more about onReady. |
@garrettjonesgoogle Are you OK with me submitting this as-is to a branch? |
I'll merge this into a branch |
Manually written batching implementation. Unlike GAX batching,
it implements flushing and does not deal with partition keys.
If we're OK with where this is heading,
we should make this work with batching settings (shouldn't be hard)
and load-test before migrating things to it.
If not, this can serve as a starting point for a better GAX batching
implementation.
In an experiment, I publish 1M messages of 300 bytes each.
Using LoggingHandler gives me ~14,000 msg/s.
BatchingWriter gives ~67,000, using similar BatchingSettings.
Letting users configure these settings might be important though;
I have observed ~300K msg/s given enough CPU and memory.