Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Allow tag used for planting to be configurable #24

Open
warmfusion opened this issue Apr 4, 2016 · 6 comments
Open

Feature: Allow tag used for planting to be configurable #24

warmfusion opened this issue Apr 4, 2016 · 6 comments

Comments

@warmfusion
Copy link

Scenario

All events from all systems are coming through to a Forest plugin to push into ElasticSearch using the first two elements of the tag as an index.

The rest of the tag is application specific meta data used by the various members of various teams.

Example of a tag; live.productA.haproxy.access or staging.productA.nginx.error.

Problem

As each unique tag results in a newly planted connection to ElasticSearch, a considerable number of new connections are established even tho the configuration is identical and the connection can be reused.

Proposal

Present an additional argument to the matching part of the configuration which defines a 'grove' of similar trees such that even though there may be hundreds of unique trees (based on unique tags) they are grouped into common groves (based on this new config) such that they share a common connection to ES (in this example).

Perhaps something that'd let me do this:

<match **>
    @type forest
    grove ${tag_parts[0..2]}
    subtype elasticsearch
    <template>
        logstash_format true
        logstash_prefix ${tag_parts[0..2]}
        hosts elasticsearch.priv.example.com
    </template>
</match>

I'd then expect for events tagged into the system, new planted trees only exist for the grove, and not for each tag.

input tag "grove"
live.product.haproxy.access live.product.haproxy
live.product.application.serviceA.event.subkey live.product.application
live.product.application.serviceB.event.otherkey live.product.application

I believe the change would be to the @mapping hash, and more specifically around here

@tagomoris
Copy link
Owner

I can understand your problem, but in general, forest plugin cannot assure that grove configuration value has consistent unit for each plants with configured parameters. Misconfigured configuration might break behavior of output plugins.
So that, i think forest plugin cannot provide such options.

On the other hand, Fluentd v0.14 plugin API will provide variable tag handling in native. It'll satisfy your requirement, i think.

@warmfusion
Copy link
Author

While I appreciate the concern around misconfiguration of plugins, I'd argue that any sufficiently advance plugin has scope for breaking itself. 😃

I don't think i'll be able to use 0.14 for a while yet; still working on transitioning from Ruby 1.9.3 😢

The impact of inconsistent hash keys on the mapping makes sense, and it absolutely follows that the possibility of having one plant when the output needs multiple would be remarkably confusing as events may not be handled consistently. That being said, would my suggested implementation provide a solution to my stated problem?

I'm wondering if I need to try and implement the changes myself to suit my use case, at least till we can get to 0.14.

@tagomoris
Copy link
Owner

My answer for this proposal is - I have no motivation to write it by myself, but I'll consider to merge pull-request for this if that code is good enough.
Thank you for detailed proposal.

@macdjord
Copy link

How about a simpler partial solution? Frequently, I write Forest configurations with no tag-specific content at all - i.e. I want a.** to do this, and b.c.* to do that, but all the tags in each category are handled exactly the same. This is easy to check for - if a config never uses __TAG__, ${tag}, etc., then anything matching that <case> or <template> will have the same config, guaranteed. In that situation, you could just create a single tree for all matching tags.

@macdjord
Copy link

macdjord commented Aug 18, 2016

More complete solution: Make TAG == grove. That is, if you define a grove, then TAG (and ${tag}, ${tag_parts[X]}, etc.) only contain the parts of the tag that were matched in the grove name.

@macdjord
Copy link

Another approach: When planting a new tree, cache the config used to initialize it. Every time a new tag comes in, generate the tree config from the , matching if any, and tag - but don't yet plant the tree. Compare this config to the configs of all previously created trees. If it is identical to one of them, forward this new tag to that existing tree. Only if the new config is distinct from all previous configs do you actually create a new tree for it.

This approach would be completely automatic - the user need to manually define 'groves' at all - and would be perfectly functionally identical to the current system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants