Skip to content
This repository has been archived by the owner on Oct 3, 2023. It is now read-only.

stats, metrics: deduplicate TimeSeries before making CreateTimeSeriesRequest-s #73

Closed
odeke-em opened this issue Jan 19, 2019 · 1 comment
Assignees

Comments

@odeke-em
Copy link
Contributor

A bug/inadequecy that I've found while doing a live test with the OpenCensus Agent. If multiple metrics are streamed from multiple sources and more than one at export instance share the same name, we'll have an error from Stackdriver's backend. This is because per CreateTimeSeriesRequest, it expects unique metrics. This problem has plagued even the stats exporter for years and the advice/work-around was setting view.SetReportingPeriod but this just masked the problem, because it gave time for aggregation to occur within an exporting period.

In the case where you have metrics concurrently streamed in, all bets are off for example given this data

{
    "name": "projects/census-demos",
    "time_series": [
        {
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/latency",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860312,
                            "nanos": 655706000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "DistributionValue": {
                                "count": 399,
                                "mean": 6461.507067283209,
                                "sum_of_squared_deviation": 5680369911.614502,
                                "bucket_options": {
                                    "Options": {
                                        "ExplicitBuckets": {
                                            "bounds": [
                                                0,
                                                10,
                                                50,
                                                100,
                                                200,
                                                400,
                                                800,
                                                1000,
                                                1400,
                                                2000,
                                                5000,
                                                10000
                                            ]
                                        }
                                    }
                                },
                                "bucket_counts": [
                                    0,
                                    0,
                                    1,
                                    0,
                                    3,
                                    3,
                                    21,
                                    5,
                                    17,
                                    15,
                                    89,
                                    153,
                                    92
                                ]
                            }
                        }
                    }
                }
            ]
        },
        {
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/process_counts",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860312,
                            "nanos": 655722000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "Int64Value": 399
                        }
                    }
                }
            ]
        },
        {
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/latency",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860372,
                            "nanos": 653868000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "DistributionValue": {
                                "count": 409,
                                "mean": 6443.895616823964,
                                "sum_of_squared_deviation": 5882240635.357754,
                                "bucket_options": {
                                    "Options": {
                                        "ExplicitBuckets": {
                                            "bounds": [
                                                0,
                                                10,
                                                50,
                                                100,
                                                200,
                                                400,
                                                800,
                                                1000,
                                                1400,
                                                2000,
                                                5000,
                                                10000
                                            ]
                                        }
                                    }
                                },
                                "bucket_counts": [
                                    0,
                                    0,
                                    1,
                                    0,
                                    4,
                                    3,
                                    22,
                                    6,
                                    17,
                                    16,
                                    90,
                                    156,
                                    94
                                ]
                            }
                        }
                    }
                }
            ]
        }
    ]
}

we get an error

err: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[2] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.: timeSeries[2]

because we've got both

  • A
{
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/latency",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860312,
                            "nanos": 655706000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "DistributionValue": {
                                "count": 399,
                                "mean": 6461.507067283209,
                                "sum_of_squared_deviation": 5680369911.614502,
                                "bucket_options": {
                                    "Options": {
                                        "ExplicitBuckets": {
                                            "bounds": [
                                                0,
                                                10,
                                                50,
                                                100,
                                                200,
                                                400,
                                                800,
                                                1000,
                                                1400,
                                                2000,
                                                5000,
                                                10000
                                            ]
                                        }
                                    }
                                },
                                "bucket_counts": [
                                    0,
                                    0,
                                    1,
                                    0,
                                    3,
                                    3,
                                    21,
                                    5,
                                    17,
                                    15,
                                    89,
                                    153,
                                    92
                                ]
                            }
                        }
                    }
                }
            ]
        }

and

  • B
        {
            "metric": {
                "type": "custom.googleapis.com/opencensus/oce/dev/latency",
                "labels": {
                    "client": "cli",
                    "method": "repl",
                    "opencensus_task": "[email protected]"
                }
            },
            "resource": {
                "type": "global"
            },
            "points": [
                {
                    "interval": {
                        "end_time": {
                            "seconds": 1547860372,
                            "nanos": 653868000
                        },
                        "start_time": {
                            "seconds": 1547857792,
                            "nanos": 658197000
                        }
                    },
                    "value": {
                        "Value": {
                            "DistributionValue": {
                                "count": 409,
                                "mean": 6443.895616823964,
                                "sum_of_squared_deviation": 5882240635.357754,
                                "bucket_options": {
                                    "Options": {
                                        "ExplicitBuckets": {
                                            "bounds": [
                                                0,
                                                10,
                                                50,
                                                100,
                                                200,
                                                400,
                                                800,
                                                1000,
                                                1400,
                                                2000,
                                                5000,
                                                10000
                                            ]
                                        }
                                    }
                                },
                                "bucket_counts": [
                                    0,
                                    0,
                                    1,
                                    0,
                                    4,
                                    3,
                                    22,
                                    6,
                                    17,
                                    16,
                                    90,
                                    156,
                                    94
                                ]
                            }
                        }
                    }
                }
            ]
        }

have a metric with Type "custom.googleapis.com/opencensus/oce/dev/latency"

@odeke-em odeke-em self-assigned this Jan 19, 2019
@odeke-em odeke-em changed the title metrics: deduplicate TimeSeries before making CreateTimeSeriesRequest-s stats, metrics: deduplicate TimeSeries before making CreateTimeSeriesRequest-s Jan 19, 2019
@odeke-em
Copy link
Contributor Author

I've verified that this problem has plagued the stats exporter since day 1. PR coming up shortly.

odeke-em added a commit that referenced this issue Jan 19, 2019
Since existence, the stats exporter was sending Stackdriver Metrics
only split up by chunks of maxUploadSize of 200, but Metric-s with the
exact same Type were still uploaded in the same CreateTimeSeriesRequest
which would cause:

    err: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written:
    Field timeSeries[?] had an invalid value: Duplicate TimeSeries encountered.
    Only one point can be written per TimeSeries per request.: timeSeries[?]

and the previous remedy just relied on a synchronization of
SetReportingPeriod of 60+s which would aggregate stats/view.Data.

This change now splits up such Metrics so even if uploads are made
in less than 60s, CreateTimeSeriesRequest-s will be uniquely uploaded
and won't cause Stackdriver's backend to trip up.

Fixes #73
odeke-em added a commit that referenced this issue Jan 19, 2019
Since existence, the stats exporter was sending Stackdriver Metrics
only split up by chunks of maxUploadSize of 200, but Metric-s with the
exact same Type were still uploaded in the same CreateTimeSeriesRequest
which would cause:

    err: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written:
    Field timeSeries[?] had an invalid value: Duplicate TimeSeries encountered.
    Only one point can be written per TimeSeries per request.: timeSeries[?]

and the previous remedy just relied on a synchronization of
SetReportingPeriod of 60+s which would aggregate stats/view.Data.

This change now splits up such Metrics so even if uploads are made
in less than 60s, CreateTimeSeriesRequest-s will be uniquely uploaded
and won't cause Stackdriver's backend to trip up.

Fixes #73
odeke-em added a commit that referenced this issue Jan 20, 2019
Since existence, the stats exporter was sending Stackdriver Metrics
only split up by chunks of maxUploadSize of 200, but Metric-s with the
exact same Type were still uploaded in the same CreateTimeSeriesRequest
which would cause:

    err: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written:
    Field timeSeries[?] had an invalid value: Duplicate TimeSeries encountered.
    Only one point can be written per TimeSeries per request.: timeSeries[?]

and the previous remedy just relied on a synchronization of
SetReportingPeriod of 60+s which would aggregate stats/view.Data.

This change now splits up such Metrics so even if uploads are made
in less than 60s, CreateTimeSeriesRequest-s will be uniquely uploaded
and won't cause Stackdriver's backend to trip up.

Fixes #73
odeke-em added a commit that referenced this issue Jan 20, 2019
Since existence, the stats exporter was sending Stackdriver Metrics
only split up by chunks of maxUploadSize of 200, but Metric-s with the
exact same Type were still uploaded in the same CreateTimeSeriesRequest
which would cause:

    err: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written:
    Field timeSeries[?] had an invalid value: Duplicate TimeSeries encountered.
    Only one point can be written per TimeSeries per request.: timeSeries[?]

and the previous remedy just relied on a synchronization of
SetReportingPeriod of 60+s which would aggregate stats/view.Data.

This change now splits up such Metrics so even if uploads are made
in less than 60s, CreateTimeSeriesRequest-s will be uniquely uploaded
and won't cause Stackdriver's backend to trip up.

Fixes #73
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant