Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Correctness checker #1993

Merged
merged 33 commits into from
Oct 28, 2019
Merged

[query] Correctness checker #1993

merged 33 commits into from
Oct 28, 2019

Conversation

arnikola
Copy link
Collaborator

@arnikola arnikola commented Oct 10, 2019

What this PR does / why we need it:
Adds a correctness checker utility, comparator, that tests a given query running against both M3Query and Prometheus, given the same base data.

Future work:

  • Use more realistic data sets than the current randomly generated values
  • Make a more comprehensive checker, to be able to point out where problems are exactly

Special notes for your reviewer:
The comparator acts as a gRPC server, yielding results on the Fetch endpoint.
An M3Query instance running in strict RPC mode connects to this server.
A Prometheus container is brought up, connecting to the M3Query instance by way of the remote_read endpoint.
(optional: see note below) A grafana instance is brought up that visually shows query results.

N.B. if the CI=false flag is provided (i.e. CI=false make docker-compatibility-test), the comparator stack is not brought down after tag completion, and a Grafana dashboard container is brought up, populated by dashboards generated by the given comparison queries. This provides a human usable endpoint to visually compare and debug failing runs.

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

@codecov
Copy link

codecov bot commented Oct 11, 2019

Codecov Report

Merging #1993 into master will decrease coverage by 0.6%.
The diff coverage is 39.2%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1993     +/-   ##
========================================
- Coverage    72.4%   71.7%   -0.7%     
========================================
  Files        1005    1008      +3     
  Lines       86270   86526    +256     
========================================
- Hits        62472   62084    -388     
- Misses      19603   20182    +579     
- Partials     4195    4260     +65
Flag Coverage Δ
#aggregator 82% <ø> (+10.8%) ⬆️
#cluster 85.6% <ø> (+12.1%) ⬆️
#collector 63.7% <ø> (-2.2%) ⬇️
#dbnode 77.8% <ø> (+4.3%) ⬆️
#m3em 73.2% <ø> (ø) ⬆️
#m3ninx 74.1% <ø> (+61.8%) ⬆️
#m3nsch 51.1% <ø> (-9.8%) ⬇️
#metrics 17.7% <ø> (-35.3%) ⬇️
#msg 74.9% <ø> (ø) ⬆️
#query 69% <39.2%> (+20.7%) ⬆️
#x 84.1% <ø> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6d653b...c59ffdc. Read the comment docs.

@arnikola arnikola marked this pull request as ready for review October 18, 2019 20:58
@arnikola arnikola changed the title Arnikola/comparator [query] Correctness checker Oct 21, 2019
@@ -0,0 +1,45 @@
#!/usr/bin/env bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, could we just call into the docker-integration-tests setup.sh with SERVICES=(m3comparator m3query)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly yeah, but the dockerfiles are different; would prefer to avoid more bash hacking for now but can probably make it work if you're keen on it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it's fine. We should clean up some of these build tools at some point though, spreading more bash in a non-DRY way will get painful eventually.

rpc:
remotes:
- name: "remote"
remoteListenAddresses: ["m3comparator:9000"]
Copy link
Collaborator

@robskillington robskillington Oct 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might be worth writing a few sentences in a README.md in this directory that explains how all the services connect to each other?

i.e. m3query native query -> remote GRPC fanout to comparator
prometheus -> m3query remote read -> remote GRPC fanout to comparator

func (r *Result) genID() {
var sb strings.Builder
// NB: this may clash but exact tag values are also checked, and this is a
// validation endpoint so there's less concern over correctness.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, might it problematic if someone pulls in this type for a real endpoint at some point (since it's in the prometheus package as common code)?

Should be as simple as:

tags := models.NewTags(0, nil)
for k, v := range r.Metric {
  tags.Add(model.Tag{Name: []byte(k), Value: []byte(v)})
}
r.id = string(tags.ID())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly leaning more to the side of removing this, it's annoying to maintain and the comparator should cover all it was meant to do

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may leave it now and excise it along with some other annoying quirks later on

src/query/api/v1/handler/prometheus/native/common.go Outdated Show resolved Hide resolved
@@ -341,7 +341,7 @@ func getIndices(
l = i
}

if ts.Before(rBound) {
if !ts.After(rBound) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this switches this behavior to be inclusive of the upper bound yeah?

Can we update the getIndices method comments to include that both lbound and rbound are inclusive?

Will this have correct effect in terms of not double counting datapoints if you split up some range, say [a,c], as [a,b] and [b,c] then do a sum of the final result together?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah Prom ends up double-counting them if they're on the step boundary, so figured we should match to that for now (could put in a check later for not double-dipping if we add in a different QL)

Copy link
Collaborator

@robskillington robskillington Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wow, ok no problems. Sounds good, so weird w.r.t this being the current status quo =/

src/query/functions/temporal/base.go Show resolved Hide resolved
align -= step
}

return offset - align
Copy link
Collaborator

@robskillington robskillington Oct 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm interesting, seems like this would read more straight forward if it used integer division perhaps? Just kind of tripped me out a bit with the double minus (i.e. align -= step which ends up actually adding step to result since final return is offset - align).

func adjustOffset(offset time.Duration, step time.Duration) time.Duration {
	if offset%step == 0 {
		return offset // This also takes care of when `offset == 0`
	}
	// Perform integer division to round down to nearest step, then add step to match Prometheus rounding up
	return (1 + (offset / step)) * step
}

// keeping end the same for now, might optimize later
p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * startShift)
alignedShift := startShift - (startShift % p.TimeSpec.Step)
p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * alignedShift)
Copy link
Collaborator

@robskillington robskillington Oct 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cropping to the upper bound of the shift, does it need to be the lower bound?

i.e.

shift = (max offset + max range)
start - shift
   ↓
|----|----|----|----|
               ↑
              start

After adjustment:

shift = (max offset + max range)
start - (shift - shift%step)
     ↓
|----|----|----|----|
               ↑
              start

Is this desirable or is it more desirable to expand the start further so that the start is "at least" max offset + max range?

i.e.

shift = (max offset + max range)
start.Add(-1*shift).Truncate(step)
↓      
|----|----|----|----|
               ↑
              start
alignedShift := maxOffset + maxRange
p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * alignedShift).Truncate(p.TimeSpec.Step)

Copy link
Collaborator Author

@arnikola arnikola Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry didn't call out this issue, but the latest PR did this by adding a processing step to go back a further StepSize if any alignment was required 👍

@@ -398,6 +400,15 @@ func (s *m3storage) SearchSeries(
}, nil
}

// CompleteTagsCompressed has the same behavior as CompleteTags.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the same return type as well? What's the difference between CompleteTags and CompleteTagsCompressed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just to allow m3storage to satisfy m3.Querier, which the rpc server has been refactored to receive, to allow a more minimal implementation of the comparator itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see np

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let me know what you think of re: the last question:
#1993 (comment)

@arnikola
Copy link
Collaborator Author

LGTM, let me know what you think of re: the last question:
#1993 (comment)

Ah sorry, agreed with it and made the change but didn't respond to the question 😛

@arnikola arnikola merged commit 83ccc30 into master Oct 28, 2019
@arnikola arnikola deleted the arnikola/comparator branch October 28, 2019 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants