Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Add ES bulk configuration #14

Merged
merged 2 commits into from
Feb 19, 2020

Conversation

pavolloffay
Copy link
Member

Signed-off-by: Pavol Loffay [email protected]

Signed-off-by: Pavol Loffay <[email protected]>
Copy link

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if this is necessary - wouldn't it be better to use the queue/try processor?

@pavolloffay
Copy link
Member Author

It is a good question. Thanks for pointing to it.

There might be a couple of differences:

  1. the main purpose of the queue retry processor is to do retries on failures. It does not batch data to a predefined size. Whereas ES bulk processor batches the requests and sends the batch once some criteria are satisfied e.g. size/time.
  2. queue processor works per TraceData and ES bulk processor batches generic ES put requests (e.g. store spans store services). The TraceData contains a batch of spans from a single process.

@objectiser
Copy link

@pavolloffay Sorry I was thinking of this one - the batch processor. Although I think it could be used in conjunction with the queue/retry one to provide a complete solution.

@pavolloffay
Copy link
Member Author

I am not sure, does it support the features as ES batch processor? In any case it would require refactoring our ES storage layer.

@objectiser
Copy link

@pavolloffay Seems to be same - batches based on size or timeout. I'll review shortly, so we can merge for now, but think this is an area we need to discuss further, to avoid using the storage layer.

@pavolloffay
Copy link
Member Author

Comparison of ES Bulk properties and OTEL batch processor:

      --es.bulk.actions int                                The number of requests that can be enqueued before the bulk processor decides to commit (default 1000)
      --es.bulk.flush-interval duration                    A time.Duration after which bulk requests are committed, regardless of other thresholds. Set to zero to disable. By default, this is disabled. (default 200ms)
      --es.bulk.size int                                   The number of bytes that the bulk requests can take up before the bulk processor decides to commit (default 5000000)
      --es.bulk.workers int                                The number of workers that are able to receive bulk requests and eventually commit them to Elasticsearch (default 1)


// OTEL Batch processor
	configmodels.ProcessorSettings `mapstructure:",squash"`

	// Timeout sets the time after which a batch will be sent regardless of size.
	Timeout *time.Duration `mapstructure:"timeout,omitempty"`

	// SendBatchSize is the size of a batch which after hit, will trigger it to be sent.
	SendBatchSize *int `mapstructure:"send_batch_size,omitempty"`

	// NumTickers sets the number of tickers to use to divide the work of looping
	// over batch buckets. This is an advanced configuration option.
	NumTickers int `mapstructure:"num_tickers,omitempty"`

	// TickTime sets time interval at which the tickers tick. This is an advanced
	// configuration option.
	TickTime *time.Duration `mapstructure:"tick_time,omitempty"`

	// RemoveAfterTicks is the number of ticks that must pass without a span arriving
	// from a node after which the batcher for that node will be deleted. This is an
	// advanced configuration option.
	RemoveAfterTicks *int `mapstructure:"remove_after_ticks,omitempty"`

The retry processor is more problematic because it retries the whole TraceData and ES bulk API might just fail a single item from the bulk request - so the whole bulk cannot be retried and there is no way to tell the retry processor what should be retried.

If we want to properly incorporate otel batch processor it will require changing the whole span writer implementation.

As this seems controversial I will just use the default values from for the bulk processor and do not expose configuration.

@pavolloffay
Copy link
Member Author

@pavolloffay
Copy link
Member Author

PR updated I have made the bulk configuration package private.

@pavolloffay pavolloffay merged commit 45b51be into jaegertracing:master Feb 19, 2020
@yurishkuro
Copy link
Member

It seems this is going quite into the weeds, while we have very high level decisions still pending. Do we have a definitive roadmap for this prototype, with clear goals and success criteria?

@objectiser
Copy link

@yurishkuro Agree, sorry my fault I haven't been able to focus as much time on it as I wanted, and still need to look at moving areas of discussion from the google doc into issues.

In terms of goals, ideally it would be good if the prototype could be a drop in replacement for the jaeger collector. Success criteria - should we create a milestone and associate the topics that need to be addressed against it?

I know the metrics comparison is still outstanding - and there will definitely be areas that need to be addressed, possibly upstream which may take some detailed discussions there.

@pavolloffay
Copy link
Member Author

I have added an agenda item to discuss this during the bi-weekly meeting.

@yurishkuro
Copy link
Member

Issues in milestone could work. I just don't want us to waste cycles on minor tweaks when big &risky questions are unresolved, such as the circular dependencies, configurability, and performance.

@pavolloffay
Copy link
Member Author

+1 I do have cycles to work on this.

My next steps are

  1. try resolving circular dependency so we can move this to jaegertracing/jaeger
  2. expose configuration via flags (My PR to tweak cobra.Command in OTEL collector has been accepted.)

So far I want to reuse as much as possible from our existing codebase and do perf tweaks later. However, it's worth documenting possible improvements like this with the batch processor.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants