Skip to content
This repository has been archived by the owner on Jul 18, 2023. It is now read-only.

Implement Aggregation of Increment and Time measurements #55

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

cypressious
Copy link
Contributor

Fixes #9

Copy link
Contributor

@nblumhardt nblumhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Just added a couple of thoughts.

// Bus Portal (busliniensuche.de)
// ==========================================================================
// All rights reserved.
// ==========================================================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contributions need to be made under the same license as the rest of the codebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.WriteTo.Emitter(pts => list.AddRange(pts))
.CreateCollector();

collector.Emit(new[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this design require memory space for all of the points in an interval until they're aggregated at emit time? Just clarifying my quick initial reading of the code ... If so it might need a re-think; one of the goals of aggregation, at the client side, would be to maintain a low bound on memory usage in the presence of vast numbers of points. Apologies if I'm misunderstanding, here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Currently, the aggregation happens at emit time, i.e. if you're using the batcher, it happens when the batcher emits. My goal with this implementation was to reduce the amount of sent data. But your point is totally valid. Reducing the memory footprint would be a nice goal, too.

What is the scenario you're thinking about, i.e. what would be the batch and aggregation timespans?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One aspect we should keep in mind is not to accidentally delay the sending of the measurements twice be buffering the aggregation and then buffering the sending.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up!

Just seems like a high-allocation metrics implementation is not ideal - generally apps that collect a lot of metrics need to be wary of this kind of caveat. It's not unusual to collect metrics in the multiple-thousands-of-hits/second, so if we're going to try optimizing for this case, aggregating at the point of collection seems pretty appealing (c.f. other metrics implementations).

@nblumhardt
Copy link
Contributor

@cypressious thanks again for the time spent on this. It's a nicely-implemented PR, but I'm not keen to move forward with it because I don't think we'll ultimately be able to build on a buffer-then-aggregate design without an eventual reset to avoid the allocations. Let me know if you're keen to dig into it further 👍 Thanks!

@cypressious
Copy link
Contributor Author

@nblumhardt I will rework the PR. I'll implement a similar mechanism to the current batching approach where data will be periodically aggregated and then flushed down the pipeline.

@cypressious cypressious force-pushed the feature-aggregate branch 2 times, most recently from 199aaab to ab0b8f1 Compare May 8, 2018 11:29
@cypressious
Copy link
Contributor Author

cypressious commented May 8, 2018

@nblumhardt I've finished the rework.

I've reimplemented the aggregation to run periodically, just like the batching. They also both derrive from the same base class now.

Because the aggregation now runs periodically, it needs to run before the batching in the pipeline.

I have one concern with the current implementation. Because points are allowed to have arbitrary timestamps, points that are buffered during each tick aren't guaranteed to really be from the same interval. That's why I still need to group them into time buckets. However, time buckets aren't necessarily aligned at the times when the aggregation happens. So points from one timer interval will most likely fall into two time buckets, even though the interval and bucket "length" is the same.

In practice, this means that we will (almost) always have twice the number of transmitted points.

I'll see if I can come up with a solution.


I've squashed the commits and rebased the PR on #59 so you should probably merge that one first.

@cypressious
Copy link
Contributor Author

@nblumhardt I think I've found a solution for the problem described above. Points within the timer interval, i.e. inside [now - interval, now] now always land inside the same bucket. Remaining points are grouped into buckets like before.

I think this PR is ready for review.

@cypressious
Copy link
Contributor Author

@nblumhardt did you have time to look at the PR?

@nblumhardt
Copy link
Contributor

Hi @cypressious - thanks for the follow-up and sorry about the slow response; my time for this project is currently low.

Just to clarify my last comment:

I'm not keen to move forward with it because I don't think we'll ultimately be able to build on a buffer-then-aggregate design without an eventual reset to avoid the allocations. Let me know if you're keen to dig into it further

By buffer-then-aggregate, I mean, collecting up the PointData and then aggregating in memory with collection operations, rather than allocating a bucket per aggregation key and performing the aggregation incrementally in-place. It won't always be able to be done without overhead, but for counters/sums, and gauges, the total memory usage required to hold these can potentially just be a value, not a collection.

By an eventual reset, I mean, if we don't accommodate this goal into the design from the start, we probably won't be able to do it in a future version without starting afresh.

I think to dig in further requires some discussion of the goals/design space etc., and probably needs another contributor or two to join the conversation to provide some perspective.

@cypressious
Copy link
Contributor Author

@nblumhardt Thanks for the reply. I misunderstood your last comment but I think I get your point now.

For sums, the bucket approach seems ideal.

Other kinds of aggregations can be done too. For averages, we could store the sum and the count to compute sum/count.

I'm still interested in implementing this feature. Do you have suggestions on which contributors to get involved?

@nblumhardt
Copy link
Contributor

👍 I'll put the call out over on the original (#9) ticket. Cheers!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants