Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement native caching for higher scale lookups #10

Open
acchen97 opened this issue Feb 8, 2019 · 4 comments
Open

Implement native caching for higher scale lookups #10

acchen97 opened this issue Feb 8, 2019 · 4 comments
Labels
enhancement New feature or request

Comments

@acchen97
Copy link

acchen97 commented Feb 8, 2019

There have already been some demand for native caching for HTTP lookups with this plugin. This would help enable higher throughput without the need for usage with conjunction with third-party caching systems like Memcached.

Please feel free to +1 if you are interested in this feature.

@acchen97 acchen97 added the enhancement New feature or request label Feb 8, 2019
@yaauie
Copy link

yaauie commented Feb 22, 2019

I envision a two-part solution:

  1. Support for proxies (including https) would be trivial to add, and would allow users to configure a local caching proxy (e.g., Squid Cache) that obeyed all of the semantics and standards of the web and kept that complexity out of our maintenance domain.
  2. A naïve LRU in-memory cache (perhaps around LogStash::Filters::Http#request_http(verb, url, options)) is also possible if a little more complex, and would reduce the overhead of a user of this plugin configuring and running above-mentioned caching proxy, at the cost of breaking some of the semantics (e.g., no upstream cache invalidation) and some unpredictability in the plugin's memory consumption.

@telune
Copy link

telune commented Jan 30, 2020

I add my vote on this one, it would be ideal for our data enrichment use case. We are now using the jdbc_streaming filter, but it's a less-than-ideal choice. The perfect choice would be the http filter with caching capabilities, just like the aforementioned jdbc_streaming, only making HTTP calls instead of SQL queries.

@grownuphacker
Copy link

+1
Just came to add my interest in this. I haven't gotten any method other than hammering my REST source with the exact same request to work.

@vjt
Copy link

vjt commented Jul 1, 2020

-1

I don't think that LogStash should have a caching layer, as there is already external software (nginx, memcached) that does that well and it's easy to integrate them with LogStash.

I have two use cases for which I am using external caches:

  • Querying an internal API service over HTTP to enrich logs coming from different sources. The information on the API service changes seldom, and logstash processes hundreds of events per second. I set up an nginx listening on localhost that proxies my API service, configured its disk cache and pointed logstash to it (see [1] below)
  • Keeping a mapping of client IP - user name. If an incoming log event has both a clientip and user fields, I store it in memcached. If I have a clientip and not an user field, I query memcached to enrich the log event (see [2] below).

That said, I find the following pluses in having the caching layer external:

  • being able to tune, change the behaviour or replace altogether the caching layers without involving LogStash or having to reconfigure it
  • being sure of not losing the cache contents if I need to restart LogStash; otherwise being able to flush the cache without involving LogStash
  • being able to scale out the cache separately than LogStash

Sorry for the verbosity, I hope this is useful also for your use cases.

[1] local caching proxy

proxy_cache_path /srv/cache/foobar levels=1:2 keys_zone=foobar:40m inactive=24h max_size=1g;

server {
  listen localhost:8084;

  access_log off;

  location / {
    proxy_pass            https://foobar;

    proxy_ignore_headers  Cache-Control;

    proxy_set_header      Host foobar.example.org;
    proxy_buffering       on;
    proxy_cache           foobar;
    proxy_cache_key       $uri$is_args$args;
    proxy_cache_valid     200 404 1h;
    proxy_cache_valid     any 5m;
    proxy_cache_lock      on;
    proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;

    add_header X-Cache-Status $upstream_cache_status;
  }
}

upstream foobar {
  server foobar.example.org:443;
}

[2] memcached enrichment

# We have a mapping from the event, store it in the cache for usage by other future events.
#
if [clientip] and [user] and [user] !~ '(?:^(?:unauthenticated|_?system|anonymous|\[?unknown\]?)$)' {
  memcached {
    hosts => ["cache-01"]
    namespace => "logstash-ip"
    set => { "[user]" => "%{clientip}" }
    ttl => 86400 # Avoid stale lookups
  }
}

# We don't have a mapping from the event, try to look it up from the cache.
#
if [clientip] and ! [user] {
  # Check the cache
  #
  memcached {
    hosts => ["cache-01"]
    namespace => "logstash-ip"
    get => { "%{clientip}" => "[user]" }
    add_tag => ["user_from_cache"]
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants