-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Correctness checker #1993
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1993 +/- ##
========================================
- Coverage 72.4% 71.7% -0.7%
========================================
Files 1005 1008 +3
Lines 86270 86526 +256
========================================
- Hits 62472 62084 -388
- Misses 19603 20182 +579
- Partials 4195 4260 +65
Continue to review full report at Codecov.
|
fdbef61
to
0c6a0fe
Compare
# Conflicts: # src/query/storage/m3/m3_mock.go # src/query/storage/m3/types.go # src/query/tsdb/remote/server.go
@@ -0,0 +1,45 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, could we just call into the docker-integration-tests setup.sh with SERVICES=(m3comparator m3query)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly yeah, but the dockerfiles are different; would prefer to avoid more bash hacking for now but can probably make it work if you're keen on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it's fine. We should clean up some of these build tools at some point though, spreading more bash in a non-DRY way will get painful eventually.
rpc: | ||
remotes: | ||
- name: "remote" | ||
remoteListenAddresses: ["m3comparator:9000"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Might be worth writing a few sentences in a README.md
in this directory that explains how all the services connect to each other?
i.e. m3query native query -> remote GRPC fanout to comparator
prometheus -> m3query remote read -> remote GRPC fanout to comparator
func (r *Result) genID() { | ||
var sb strings.Builder | ||
// NB: this may clash but exact tag values are also checked, and this is a | ||
// validation endpoint so there's less concern over correctness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, might it problematic if someone pulls in this type for a real endpoint at some point (since it's in the prometheus
package as common code)?
Should be as simple as:
tags := models.NewTags(0, nil)
for k, v := range r.Metric {
tags.Add(model.Tag{Name: []byte(k), Value: []byte(v)})
}
r.id = string(tags.ID())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly leaning more to the side of removing this, it's annoying to maintain and the comparator should cover all it was meant to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may leave it now and excise it along with some other annoying quirks later on
@@ -341,7 +341,7 @@ func getIndices( | |||
l = i | |||
} | |||
|
|||
if ts.Before(rBound) { | |||
if !ts.After(rBound) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm this switches this behavior to be inclusive of the upper bound yeah?
Can we update the getIndices
method comments to include that both lbound and rbound are inclusive?
Will this have correct effect in terms of not double counting datapoints if you split up some range, say [a,c]
, as [a,b]
and [b,c]
then do a sum of the final result together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah Prom ends up double-counting them if they're on the step boundary, so figured we should match to that for now (could put in a check later for not double-dipping if we add in a different QL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wow, ok no problems. Sounds good, so weird w.r.t this being the current status quo =/
src/query/parser/promql/parse.go
Outdated
align -= step | ||
} | ||
|
||
return offset - align |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm interesting, seems like this would read more straight forward if it used integer division perhaps? Just kind of tripped me out a bit with the double minus (i.e. align -= step
which ends up actually adding step
to result since final return is offset - align
).
func adjustOffset(offset time.Duration, step time.Duration) time.Duration {
if offset%step == 0 {
return offset // This also takes care of when `offset == 0`
}
// Perform integer division to round down to nearest step, then add step to match Prometheus rounding up
return (1 + (offset / step)) * step
}
// keeping end the same for now, might optimize later | ||
p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * startShift) | ||
alignedShift := startShift - (startShift % p.TimeSpec.Step) | ||
p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * alignedShift) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cropping to the upper bound of the shift, does it need to be the lower bound?
i.e.
shift = (max offset + max range)
start - shift
↓
|----|----|----|----|
↑
start
After adjustment:
shift = (max offset + max range)
start - (shift - shift%step)
↓
|----|----|----|----|
↑
start
Is this desirable or is it more desirable to expand the start further so that the start is "at least" max offset + max range
?
i.e.
shift = (max offset + max range)
start.Add(-1*shift).Truncate(step)
↓
|----|----|----|----|
↑
start
alignedShift := maxOffset + maxRange
p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * alignedShift).Truncate(p.TimeSpec.Step)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry didn't call out this issue, but the latest PR did this by adding a processing step to go back a further StepSize if any alignment was required 👍
@@ -398,6 +400,15 @@ func (s *m3storage) SearchSeries( | |||
}, nil | |||
} | |||
|
|||
// CompleteTagsCompressed has the same behavior as CompleteTags. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the same return type as well? What's the difference between CompleteTags
and CompleteTagsCompressed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just to allow m3storage
to satisfy m3.Querier
, which the rpc server has been refactored to receive, to allow a more minimal implementation of the comparator itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see np
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let me know what you think of re: the last question:
#1993 (comment)
Ah sorry, agreed with it and made the change but didn't respond to the question 😛 |
What this PR does / why we need it:
Adds a correctness checker utility,
comparator
, that tests a given query running against both M3Query and Prometheus, given the same base data.Future work:
Special notes for your reviewer:
The comparator acts as a gRPC server, yielding results on the
Fetch
endpoint.An M3Query instance running in strict RPC mode connects to this server.
A Prometheus container is brought up, connecting to the M3Query instance by way of the
remote_read
endpoint.(optional: see note below) A grafana instance is brought up that visually shows query results.
N.B. if the
CI=false
flag is provided (i.e.CI=false make docker-compatibility-test
), the comparator stack is not brought down after tag completion, and a Grafana dashboard container is brought up, populated by dashboards generated by the given comparison queries. This provides a human usable endpoint to visually compare and debug failing runs.Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: