-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignite netrisk with HyperLogLog? #3
Comments
Thanks for your comments and PR #2 - I just merged it.
I'm guessing you mean "cardinality" of something here? Possibly UAs per subnet? Some time ago I sketched out the typical entities in web interactions and the expected cardinality of each of their relations with reasons for exceptions: I don't have any plans to develop this "netrisk" project beyond it's current simple form - it was built to support a blog post showing how some of the elasticsearch aggregations can be applied in practice. Feel free to fork it of course if you find it useful :) |
Sorry, cardinality is indeed what I meant. We actually use entity centric views, yet currently don cache it back to ES. Would this be the same as ES backed cache for event sourcing? It's funny you mention this example, as we've hacked together a browser plugin that ships fingerprints to ES, where application and webserver logs reside. Using sign-terms and a graph it's pretty powerful on very diverse datasets. Depending on cluster size, high cardinality fields might as well be used instead of significant terms for cached performance. Do you happen to know a entity centric indexing / event sourcing framework that both supports ETL (per single event) and ES backed aggregations(historic aggregated events)? Browser shipper: https://git.bitsensor.io/ruben/browser/blob/master/src/index.js |
http://snowplowanalytics.com/product/ is a big project in this area with "trackers" for a variety of client platforms. |
@markharwood,
Great work on significant terms, maybe even greater visualization of the 4 strategies in your comment!
Working in the same space, yet having access to more detailed data, we have found input carnality to be higher related with attack impact than relative volume, if one vector has to be chosen.
Would love to commit doing a PR for you, yet to demo that in your project the dummy data has to include uri/ua hashes or values.
By the way, do we see you in Berlin at GOTO?
Ruben
The text was updated successfully, but these errors were encountered: