From e6ce08067317c7009d0eb60b214a4ee333d22aef Mon Sep 17 00:00:00 2001 From: Karen Metts <35154725+karenzone@users.noreply.github.com> Date: Thu, 12 Mar 2020 15:47:01 -0400 Subject: [PATCH] [DOC] Restructure and expand best practices doc (#55) * Increase the default number of threads to 16 Increase the default number of threads passed to the EventProcessorHost to 16, to match the standard default. Re-work docs to reflect the fact the change to defaults, and the fact that event_hubs*partitions is not the maximum number of threads, and more threads may yield better performance. * Restructure best practices docs * Remove rb file from pr Co-authored-by: Rob Bavey --- docs/index.asciidoc | 54 +++++++++++++++++++++++++++++++-------------- 1 file changed, 38 insertions(+), 16 deletions(-) diff --git a/docs/index.asciidoc b/docs/index.asciidoc index 6fa0c61..a1bb0d3 100644 --- a/docs/index.asciidoc +++ b/docs/index.asciidoc @@ -78,35 +78,57 @@ https://portal.azure.com[Azure Portal]`-> Blob Storage account -> Access keys`. Here are some guidelines to help you avoid data conflicts that can cause lost events. -* **Create a Logstash consumer group.** +* <> +* <> +* <> + +[id="plugins-{type}s-{plugin}-bp-group"] +====== Create a Logstash consumer group Create a new consumer group specifically for Logstash. Do not use the $default or any other consumer group that might already be in use. Reusing consumer groups among non-related consumers can cause unexpected behavior and possibly lost events. All Logstash instances should use the same consumer group so that they can work together for processing events. -* **Avoid overwriting offset with multiple Event Hubs.** + +[id="plugins-{type}s-{plugin}-bp-multihub"] +====== Avoid overwriting offset with multiple Event Hubs The offsets (position) of the Event Hubs are stored in the configured Azure Blob store. The Azure Blob store uses paths like a file system to store the offsets. If the paths between multiple Event Hubs overlap, then the offsets may be stored incorrectly. + To avoid duplicate file paths, use the advanced configuration model and make sure that at least one of these options is different per Event Hub: + ** storage_connection ** storage_container (defaults to Event Hub name if not defined) ** consumer_group -* **Set number of threads correctly.** -The number of threads should equal the number of Event Hubs plus one (or more). -Each Event Hub needs at least one thread. An additional thread is needed to help -coordinate the other threads. The number of threads should not exceed the number -of Event Hubs multiplied by the number of partitions per Event Hub plus one. -Threads are currently available only as a global setting. If you are using multiple -pipelines, the threads setting applies to each pipeline independently. -** Sample scenario: Event Hubs = 4. Partitions on each Event Hub = 3. -Minimum threads is 5 (4 Event Hubs plus one). Maximum threads is 13 (4 Event -Hubs times 3 partitions plus one). -** If you’re collecting activity logs from one event hub instance, + + +[id="plugins-{type}s-{plugin}-bp-threads"] +====== Set number of threads correctly + +By default, the number of threads used to service all event hubs is `16`. And while this +may be sufficient for most use cases, throughput may be improved by refining this number. +When servicing a large number of partitions across one or more event hubs, setting a higher +value may result in improved performance. The maximum number of threads is not strictly bound +by the total number of partitions being serviced, but setting the value much higher than +that may mean that some threads are idle. + +NOTE: The number of threads *must* be greater than or equal to the number of Event hubs plus one. + +NOTE: Threads are currently available only as a global setting across all event hubs in a single `azure_event_hubs` +input definition. However if your configuration includes multiple `azure_event_hubs` inputs, the threads setting applies +independently to each. + +**Sample scenarios** + +* Event Hubs = 4. Partitions on each Event Hub = 3. +Minimum threads is 5 (4 Event Hubs plus one). +* If you’re collecting activity logs from one event hub instance, then only 2 threads (1 Event Hub plus one) are required. + [id="plugins-{type}s-{plugin}-eh_config_models"] ==== Configuration models @@ -478,8 +500,8 @@ azure_event_hubs { ===== `threads` * Value type is <> * Minimum value is `2` -* Default value is `4` - +* Default value is `16` + Total number of threads used to process events. The value you set here applies to all Event Hubs. Even with advanced configuration, this value is a global setting, and can't be set per event hub. @@ -487,7 +509,7 @@ setting, and can't be set per event hub. [source,ruby] ---- azure_event_hubs { - threads => 4 + threads => 16 } ----