Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix compound query key in metric aggregation with bucket_interval #71

Merged
merged 2 commits into from
Apr 22, 2021

Conversation

just1900
Copy link
Contributor

@just1900 just1900 commented Mar 31, 2021

Fix metric aggregation while using compound query key and bucket_interval together

What happened here

  1. config rules with type metric_aggregation, compound query_key and bucket_interval like below.
type: metric_aggregation
query_key: ["prometheus.labels.instance","prometheus.labels.app"]
bucket_interval:
  seconds: 30
  1. Debugging into the call stack.
    the payload_data param in function add_aggregation_data will contain compound bucket_aggskey looks like below https://github.com/jertel/elastalert/blob/c8949c1e715ca0b9b5b923a5e3625c52163a4068/elastalert/ruletypes.py#L1034-L1041
{
    "bucket_aggs": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "kafka",
                "doc_count": 32916,
                "bucket_aggs": {
                    "doc_count_error_upper_bound": 0,
                    "sum_other_doc_count": 0,
                    "buckets": [
                        {
                            "key": "stress-test-kafka",
                            "doc_count": 17628,
                            "interval_aggs": {
                                "buckets": [
                                    {
                                        "key_as_string": "2021-03-31T02:11:30.000Z",
                                        "key": 1617156690000,
                                        "doc_count": 1469,
                                        "metric_prometheus.metrics.kafka_controller_kafkacontroller_activecontrollercount_sum": {
                                            "value": 1
                                        }
                                    },
...(duplicated character removed)

then after indexing https://github.com/jertel/elastalert/blob/c8949c1e715ca0b9b5b923a5e3625c52163a4068/elastalert/ruletypes.py#L1039 and unwarp_term_bucket https://github.com/jertel/elastalert/blob/c8949c1e715ca0b9b5b923a5e3625c52163a4068/elastalert/ruletypes.py#L1048-L1053
the aggregation_data param in function check_matches looks like

{
    "key": "kafka",
    "doc_count": 32916,
    "bucket_aggs": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "stress-test-kafka",
                "doc_count": 17628,
                "interval_aggs": {
                    "buckets": [
                        {
                            "key_as_string": "2021-03-31T02:11:30.000Z",
                            "key": 1617156690000,
                            "doc_count": 1469,
                            "metric_prometheus.metrics.kafka_controller_kafkacontroller_activecontrollercount_sum": {
                                "value": 1
                            }
                        },
...(duplicated character removed)

What we could see is that the interval_aggs key is not unwrapped here. Also, the function check_matches_recursive only unwrap bucket_aggs recursively.
https://github.com/jertel/elastalert/blob/c8949c1e715ca0b9b5b923a5e3625c52163a4068/elastalert/ruletypes.py#L1112-L1127
So if the bucket_interval is configured in the rules, and line 1127 will run into the KeyError while indexing self.metric_key.

How to Fix it

  • adding unwrap for interval_aggs key in function check_matches_recursive

@just1900
Copy link
Contributor Author

the pr in the upstream Yelp/elastalert#3161

Copy link
Owner

@jertel jertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I see some related tests in tests/rules_test.py, specifically test_base_aggregation_payloads(). Would be good to see some coverage added for this scenario.

@mrfroggg
Copy link
Contributor

Could this solution works for percentage_match?
Yelp/elastalert#1281 / Yelp/elastalert#2133

@jertel
Copy link
Owner

jertel commented Apr 22, 2021

Good question. It looks unrelated but there could be something I'm not seeing.

The submitter hasn't responded to the unit test topic. If anyone else needs this change merged, please add a quick unit test to the existing rules_test.py, and I'll merge it in.

@just1900
Copy link
Contributor Author

just1900 commented Apr 22, 2021

@jertel PTAL, I've just got some spare time to look into this, sorry for the long delay.
The unit test have been added and I've modified the implementation for the reason that there should be only one interval_aggs bucket, so if the recursive call have seen interval_agg, it means the recursive call reaches the end.

@mrfroggg I checked your commented issue and the code in Yelp#2133, few things I have noticed

Copy link
Owner

@jertel jertel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions @just1900

@jertel jertel merged commit 7b99e21 into jertel:alt Apr 22, 2021
@just1900 just1900 changed the title fix compound query key in metric aggregation fix compound query key in metric aggregation with bucket_interval Apr 22, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants