Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudv2: Higher resolution for Histograms #3302

Merged
merged 3 commits into from
Aug 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 31 additions & 18 deletions output/cloud/expv2/hdr.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ import (
)

const (
// defaultMinimumResolution is the default resolution used by histogram.
// It allows to have a higher granularity compared to the basic 1.0 value,
// supporting floating points up to 3 digits.
defaultMinimumResolution = .001

// lowestTrackable represents the minimum value that the histogram tracks.
// Essentially, it excludes negative numbers.
// Most of metrics tracked by histograms are durations
Expand All @@ -17,12 +22,6 @@ const (
// In the future, we may expand and include them,
// probably after https://github.com/grafana/k6/issues/763.
lowestTrackable = 0

// highestTrackable represents the maximum
// value that the histogram is able to track with high accuracy (0.1% of error).
// It should be a high enough
// and rationale value for the k6 context; 2^30 = 1_073_741_824
highestTrackable = 1 << 30
)

// histogram represents a distribution
Expand Down Expand Up @@ -61,13 +60,18 @@ type histogram struct {

// Count is counts the amount of observed values.
Count uint32

// MinimumResolution represents resolution used by Histogram.
// In principle, it is a multiplier factor for the tracked values.
MinimumResolution float64
}

func newHistogram() *histogram {
return &histogram{
Buckets: make(map[uint32]uint32),
Max: -math.MaxFloat64,
Min: math.MaxFloat64,
MinimumResolution: defaultMinimumResolution,
Buckets: make(map[uint32]uint32),
Max: -math.MaxFloat64,
Min: math.MaxFloat64,
}
}

Expand All @@ -85,14 +89,16 @@ func (h *histogram) addToBucket(v float64) {
h.Count++
h.Sum += v

if v > highestTrackable {
h.ExtraHighBucket++
return
}
v /= h.MinimumResolution

if v < lowestTrackable {
h.ExtraLowBucket++
return
}
if v > math.MaxInt64 {
h.ExtraHighBucket++
return
}

h.Buckets[resolveBucketIndex(v)]++
}
Expand Down Expand Up @@ -151,6 +157,9 @@ func histogramAsProto(h *histogram, time int64) *pbcloud.TrendHdrValue {
if h.ExtraHighBucket > 0 {
hval.ExtraHighValuesCounter = &h.ExtraHighBucket
}
// We don't expect to change the minimum resolution at runtime
// so it is safe use directly a pointer without creating a copy
hval.MinResolution = &h.MinimumResolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there are reason to use a pointer though?

It won't save memory and will add dereferencing the point when we have to actually write it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the problem is that this is because in the protobuf definition this is not required 🤦

Not certain we can do anythign about this 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the reason is how Protobuf defines optional values. We haven't released yet the new cloud output v2 as the default so we could think of migrating directly to a new Protobuf version making it required. But my feeling is that it would require a maintenance window.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if we un-optional this field, this doesn't really change anything for the backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esquonk if the suggestion is to change it only on the client then it would require having different proto file versions between the backend and client, that doesn't sound optimal to me.
Instead, if you mean for both of them, then In that case, the problem is how to handle the versioning. If we set the field on the backend as required then the already k6 v0.46.0 released version will stop working if we force the K6_CLOUD_API_VERSION=2 overwriting it from the backend. We could consider it as a no-problem as we are pretty confident that we will set it directly from the code releasing v0.47.0, but I'm not sure. Thoughts?

return hval
}

Expand All @@ -164,7 +173,7 @@ func resolveBucketIndex(val float64) uint32 {
// We upscale to the next integer to ensure that each sample falls
// within a specific bucket, even when the value is fractional.
// This avoids under-representing the distribution in the histogram.
upscaled := uint32(math.Ceil(val))
upscaled := uint64(math.Ceil(val))

// In histograms, bucket boundaries are usually defined as multiples of powers of 2,
// allowing for efficient computation of bucket indexes.
Expand All @@ -181,11 +190,11 @@ func resolveBucketIndex(val float64) uint32 {
// 2^10 = 1024 ~ 1000 = 10^3
// f(x) = 3*x + 1 - empiric formula that works for us
// since f(2)=7 and f(3)=10
const k = uint32(7)
const k = uint64(7)

// 256 = 1 << (k+1)
if upscaled < 256 {
return upscaled
return uint32(upscaled)
}

// `nkdiff` helps us find the right bucket for `upscaled`. It does so by determining the
Expand All @@ -205,8 +214,12 @@ func resolveBucketIndex(val float64) uint32 {
// = (n-k+1)<<k + u>>(n-k) - (1<<k) =
// = (n-k)<<k + u>>(n-k)
//
nkdiff := uint32(bits.Len32(upscaled>>k) - 1) // msb index
return (nkdiff << k) + (upscaled >> nkdiff)
nkdiff := uint64(bits.Len64(upscaled>>k)) - 1 // msb index

// We cast safely downscaling because we don't expect we may hit the uint32 limit
// with the bucket index. The bucket represented from the index as MaxUint32
// would be a very huge number bigger than the trackable limits.
return uint32((nkdiff << k) + (upscaled >> nkdiff))
}

// Add implements the metricValue interface.
Expand Down
Loading