Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses ES bulk api only when there's more than one span #1146

Merged
merged 1 commit into from
Jun 27, 2016

Conversation

codefromthecrypt
Copy link
Member

During a test where 100 single-span messages are sent to Kafka at the
same time, I noticed only 58-97 of them would end up in storage
eventhough all messages parsed properly and no operations failed.

After a 100ms/message pause was added, the store rate of this test went
to 100%, so figured it was some sort of state issue. I noticed the code
was using Bulk operations regardless of input size, so as a wild guess
changed the special-case single-span messages. At least in this test, it
raised the success rate to 100% without any pausing needed.

I don't know why this worked, but it seems sensible to not use bulk apis
when there's no bulk action to perform.

I started to write a unit test to validate single-length lists don't use
bulk, but the Mockito involved became too verbose as the Elasticsearch
client uses chaining and other patterns that are tedious to mock.

Instead, we should make a parallel integration test and apply them to
all storage components.

See #1141

@codefromthecrypt
Copy link
Member Author

ping @anuraaga

if (spans.isEmpty()) return Futures.immediateFuture(null);

// Create a bulk request when there is more than one span to store
ListenableFuture<?> future;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend final since it's a branched initialization

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope was too bad.. leaving un-final :P

@anuraaga
Copy link
Contributor

LGTM - thanks for fixing (and feel the pain on the difficulty of unit testing ES client...).

During a test where 100 single-span messages are sent to Kafka at the
same time, I noticed only 58-97 of them would end up in storage
eventhough all messages parsed properly and no operations failed.

After a 100ms/message pause was added, the store rate of this test went
to 100%, so figured it was some sort of state issue. I noticed the code
was using Bulk operations regardless of input size, so as a wild guess
changed the special-case single-span messages. At least in this test, it
raised the success rate to 100% without any pausing needed.

I don't know why this worked, but it seems sensible to not use bulk apis
when there's no bulk action to perform.

I started to write a unit test to validate single-length lists don't use
bulk, but the Mockito involved became too verbose as the Elasticsearch
client uses chaining and other patterns that are tedious to mock.

Instead, we should make a parallel integration test and apply them to
all storage components.
@codefromthecrypt codefromthecrypt merged commit 245e5c0 into master Jun 27, 2016
@codefromthecrypt codefromthecrypt deleted the one-isnt-bu branch June 27, 2016 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants