Skip to content

Commit

Permalink
[DOC] Restructure and expand best practices doc (#55)
Browse files Browse the repository at this point in the history
* Increase the default number of threads to 16

Increase the default number of threads passed to the EventProcessorHost
to 16, to match the standard default.

Re-work docs to reflect the fact the change to defaults, and the fact
that event_hubs*partitions is not the maximum number of threads, and
more threads may yield better performance.

* Restructure best practices docs

* Remove rb file from pr

Co-authored-by: Rob Bavey <[email protected]>
  • Loading branch information
karenzone and robbavey authored Mar 12, 2020
1 parent 37d8f72 commit e6ce080
Showing 1 changed file with 38 additions and 16 deletions.
54 changes: 38 additions & 16 deletions docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -78,35 +78,57 @@ https://portal.azure.com[Azure Portal]`-> Blob Storage account -> Access keys`.
Here are some guidelines to help you avoid data conflicts that can cause lost
events.

* **Create a Logstash consumer group.**
* <<plugins-{type}s-{plugin}-bp-group>>
* <<plugins-{type}s-{plugin}-bp-multihub>>
* <<plugins-{type}s-{plugin}-bp-threads>>

[id="plugins-{type}s-{plugin}-bp-group"]
====== Create a Logstash consumer group
Create a new consumer group specifically for Logstash. Do not use the $default or
any other consumer group that might already be in use. Reusing consumer groups
among non-related consumers can cause unexpected behavior and possibly lost
events. All Logstash instances should use the same consumer group so that they can
work together for processing events.
* **Avoid overwriting offset with multiple Event Hubs.**

[id="plugins-{type}s-{plugin}-bp-multihub"]
====== Avoid overwriting offset with multiple Event Hubs
The offsets (position) of the Event Hubs are stored in the configured Azure Blob
store. The Azure Blob store uses paths like a file system to store the offsets.
If the paths between multiple Event Hubs overlap, then the offsets may be stored
incorrectly.

To avoid duplicate file paths, use the advanced configuration model and make
sure that at least one of these options is different per Event Hub:

** storage_connection
** storage_container (defaults to Event Hub name if not defined)
** consumer_group
* **Set number of threads correctly.**
The number of threads should equal the number of Event Hubs plus one (or more).
Each Event Hub needs at least one thread. An additional thread is needed to help
coordinate the other threads. The number of threads should not exceed the number
of Event Hubs multiplied by the number of partitions per Event Hub plus one.
Threads are currently available only as a global setting. If you are using multiple
pipelines, the threads setting applies to each pipeline independently.
** Sample scenario: Event Hubs = 4. Partitions on each Event Hub = 3.
Minimum threads is 5 (4 Event Hubs plus one). Maximum threads is 13 (4 Event
Hubs times 3 partitions plus one).
** If you’re collecting activity logs from one event hub instance,


[id="plugins-{type}s-{plugin}-bp-threads"]
====== Set number of threads correctly

By default, the number of threads used to service all event hubs is `16`. And while this
may be sufficient for most use cases, throughput may be improved by refining this number.
When servicing a large number of partitions across one or more event hubs, setting a higher
value may result in improved performance. The maximum number of threads is not strictly bound
by the total number of partitions being serviced, but setting the value much higher than
that may mean that some threads are idle.

NOTE: The number of threads *must* be greater than or equal to the number of Event hubs plus one.

NOTE: Threads are currently available only as a global setting across all event hubs in a single `azure_event_hubs`
input definition. However if your configuration includes multiple `azure_event_hubs` inputs, the threads setting applies
independently to each.

**Sample scenarios**

* Event Hubs = 4. Partitions on each Event Hub = 3.
Minimum threads is 5 (4 Event Hubs plus one).
* If you’re collecting activity logs from one event hub instance,
then only 2 threads (1 Event Hub plus one) are required.


[id="plugins-{type}s-{plugin}-eh_config_models"]
==== Configuration models

Expand Down Expand Up @@ -478,16 +500,16 @@ azure_event_hubs {
===== `threads`
* Value type is <<number,number>>
* Minimum value is `2`
* Default value is `4`
* Default value is `16`

Total number of threads used to process events. The value you set here applies
to all Event Hubs. Even with advanced configuration, this value is a global
setting, and can't be set per event hub.

[source,ruby]
----
azure_event_hubs {
threads => 4
threads => 16
}
----

Expand Down

0 comments on commit e6ce080

Please sign in to comment.