[query] Correctness checker #1993

arnikola · 2019-10-10T08:14:34Z

What this PR does / why we need it:
Adds a correctness checker utility, comparator, that tests a given query running against both M3Query and Prometheus, given the same base data.

Future work:

Use more realistic data sets than the current randomly generated values
Make a more comprehensive checker, to be able to point out where problems are exactly

Special notes for your reviewer:
The comparator acts as a gRPC server, yielding results on the Fetch endpoint.
An M3Query instance running in strict RPC mode connects to this server.
A Prometheus container is brought up, connecting to the M3Query instance by way of the remote_read endpoint.
(optional: see note below) A grafana instance is brought up that visually shows query results.

N.B. if the CI=false flag is provided (i.e. CI=false make docker-compatibility-test), the comparator stack is not brought down after tag completion, and a Grafana dashboard container is brought up, populated by dashboards generated by the given comparison queries. This provides a human usable endpoint to visually compare and debug failing runs.

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

codecov · 2019-10-11T19:39:36Z

Codecov Report

Merging #1993 into master will decrease coverage by 0.6%.
The diff coverage is 39.2%.

@@           Coverage Diff            @@
##           master   #1993     +/-   ##
========================================
- Coverage    72.4%   71.7%   -0.7%     
========================================
  Files        1005    1008      +3     
  Lines       86270   86526    +256     
========================================
- Hits        62472   62084    -388     
- Misses      19603   20182    +579     
- Partials     4195    4260     +65

Flag	Coverage Δ
#aggregator	`82% <ø> (+10.8%)`	⬆️
#cluster	`85.6% <ø> (+12.1%)`	⬆️
#collector	`63.7% <ø> (-2.2%)`	⬇️
#dbnode	`77.8% <ø> (+4.3%)`	⬆️
#m3em	`73.2% <ø> (ø)`	⬆️
#m3ninx	`74.1% <ø> (+61.8%)`	⬆️
#m3nsch	`51.1% <ø> (-9.8%)`	⬇️
#metrics	`17.7% <ø> (-35.3%)`	⬇️
#msg	`74.9% <ø> (ø)`	⬆️
#query	`69% <39.2%> (+20.7%)`	⬆️
#x	`84.1% <ø> (+0.2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6d653b...c59ffdc. Read the comment docs.

# Conflicts: # src/query/storage/m3/m3_mock.go # src/query/storage/m3/types.go # src/query/tsdb/remote/server.go

…a/comparator

scripts/comparator/compare.go

robskillington · 2019-10-25T03:30:45Z

scripts/comparator/setup.sh

@@ -0,0 +1,45 @@
+#!/usr/bin/env bash


Hm, could we just call into the docker-integration-tests setup.sh with SERVICES=(m3comparator m3query)?

Mostly yeah, but the dockerfiles are different; would prefer to avoid more bash hacking for now but can probably make it work if you're keen on it

Hm, it's fine. We should clean up some of these build tools at some point though, spreading more bash in a non-DRY way will get painful eventually.

src/cmd/services/m3comparator/main/main.go

src/cmd/services/m3comparator/main/querier.go

robskillington · 2019-10-27T16:44:10Z

scripts/comparator/m3query.yml

+rpc:
+  remotes:
+    - name: "remote"
+      remoteListenAddresses: ["m3comparator:9000"]


nit: Might be worth writing a few sentences in a README.md in this directory that explains how all the services connect to each other?

i.e. m3query native query -> remote GRPC fanout to comparator
prometheus -> m3query remote read -> remote GRPC fanout to comparator

scripts/comparator/utils/compare_utilities.go

src/cmd/services/m3comparator/main/main.go

src/cmd/services/m3comparator/main/querier.go

src/cmd/services/m3comparator/main/series_iterator_builder.go

robskillington · 2019-10-27T17:24:29Z

src/query/api/v1/handler/prometheus/common.go

+func (r *Result) genID() {
+	var sb strings.Builder
+	// NB: this may clash but exact tag values are also checked, and this is a
+	// validation endpoint so there's less concern over correctness.


Hm, might it problematic if someone pulls in this type for a real endpoint at some point (since it's in the prometheus package as common code)?

Should be as simple as:

tags := models.NewTags(0, nil) for k, v := range r.Metric { tags.Add(model.Tag{Name: []byte(k), Value: []byte(v)}) } r.id = string(tags.ID())

I'm honestly leaning more to the side of removing this, it's annoying to maintain and the comparator should cover all it was meant to do

I think I may leave it now and excise it along with some other annoying quirks later on

src/query/api/v1/handler/prometheus/native/common.go

robskillington · 2019-10-27T17:40:02Z

src/query/functions/temporal/base.go

@@ -341,7 +341,7 @@ func getIndices(
 			l = i
 		}

-		if ts.Before(rBound) {
+		if !ts.After(rBound) {


Hm this switches this behavior to be inclusive of the upper bound yeah?

Can we update the getIndices method comments to include that both lbound and rbound are inclusive?

Will this have correct effect in terms of not double counting datapoints if you split up some range, say [a,c], as [a,b] and [b,c] then do a sum of the final result together?

Yeah Prom ends up double-counting them if they're on the step boundary, so figured we should match to that for now (could put in a check later for not double-dipping if we add in a different QL)

Ah wow, ok no problems. Sounds good, so weird w.r.t this being the current status quo =/

src/query/functions/temporal/base.go

robskillington · 2019-10-27T18:03:27Z

src/query/parser/promql/parse.go

+		align -= step
+	}
+
+	return offset - align


Hm interesting, seems like this would read more straight forward if it used integer division perhaps? Just kind of tripped me out a bit with the double minus (i.e. align -= step which ends up actually adding step to result since final return is offset - align).

func adjustOffset(offset time.Duration, step time.Duration) time.Duration { if offset%step == 0 { return offset // This also takes care of when `offset == 0` } // Perform integer division to round down to nearest step, then add step to match Prometheus rounding up return (1 + (offset / step)) * step }

robskillington · 2019-10-27T18:38:01Z

src/query/plan/physical.go

-	// keeping end the same for now, might optimize later
-	p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * startShift)
+	alignedShift := startShift - (startShift % p.TimeSpec.Step)
+	p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * alignedShift)


This is cropping to the upper bound of the shift, does it need to be the lower bound?

i.e.

shift = (max offset + max range) start - shift ↓ |----|----|----|----| ↑ start

After adjustment:

shift = (max offset + max range) start - (shift - shift%step) ↓ |----|----|----|----| ↑ start

Is this desirable or is it more desirable to expand the start further so that the start is "at least" max offset + max range?

i.e.

shift = (max offset + max range) start.Add(-1*shift).Truncate(step) ↓ |----|----|----|----| ↑ start

alignedShift := maxOffset + maxRange p.TimeSpec.Start = p.TimeSpec.Start.Add(-1 * alignedShift).Truncate(p.TimeSpec.Step)

Ah sorry didn't call out this issue, but the latest PR did this by adding a processing step to go back a further StepSize if any alignment was required 👍

robskillington · 2019-10-27T18:41:33Z

src/query/storage/m3/storage.go

@@ -398,6 +400,15 @@ func (s *m3storage) SearchSeries(
 	}, nil
 }

+// CompleteTagsCompressed has the same behavior as CompleteTags.


Is it the same return type as well? What's the difference between CompleteTags and CompleteTagsCompressed?

It's just to allow m3storage to satisfy m3.Querier, which the rpc server has been refactored to receive, to allow a more minimal implementation of the comparator itself.

robskillington

LGTM, let me know what you think of re: the last question:
#1993 (comment)

arnikola · 2019-10-28T13:00:43Z

LGTM, let me know what you think of re: the last question:
#1993 (comment)

Ah sorry, agreed with it and made the change but didn't respond to the question 😛

…a/comparator

arnikola added 3 commits October 10, 2019 03:25

[query] Comparator

242a821

Updating docker files

93a336f

Missed adding comparator.go

c28c83a

robskillington force-pushed the master branch from fdbef61 to 0c6a0fe Compare October 15, 2019 06:42

arnikola added 11 commits October 16, 2019 13:31

WIP

483c60a

Fix docker composition

91dbde0

Generating series

9cfb7c8

Added grafana to comparator

4106a4f

WIP

23d8c6b

Merge branch 'master' into arnikola/comparator

b12e598

# Conflicts: # src/query/storage/m3/m3_mock.go # src/query/storage/m3/types.go # src/query/tsdb/remote/server.go

!!

f667c12

Fix this thing

9a80a8b

Cleaning up code, aligning offsets.

a650ea8

Tests passing

b5bd97e

Cleanup

8fafe9c

arnikola marked this pull request as ready for review October 18, 2019 20:58

arnikola added 3 commits October 18, 2019 16:58

Merge branch 'master' into arnikola/comparator

554279b

Fixing offset stuff

5674e5b

Adding comparator tests, making grafana local only

bf6cf4e

arnikola changed the title ~~Arnikola/comparator~~ [query] Correctness checker Oct 21, 2019

arnikola added 4 commits October 21, 2019 18:16

Fix some tests, generated files should now be gitignored.

962c78a

Fix query parsing

315689e

sed awk etc

13ad1c8

Merge branch 'master' into arnikola/comparator

28ea7ea

arnikola added 3 commits October 22, 2019 20:19

Fix some scripting, add more query types

88f3e42

Merge branch 'arnikola/comparator' of github.com:m3db/m3 into arnikol…

9ad7e83

…a/comparator

Fixing CI tests

75a5037

robskillington reviewed Oct 25, 2019

View reviewed changes

scripts/comparator/compare.go Outdated Show resolved Hide resolved

robskillington reviewed Oct 25, 2019

View reviewed changes

src/cmd/services/m3comparator/main/main.go Show resolved Hide resolved

robskillington reviewed Oct 25, 2019

View reviewed changes

src/cmd/services/m3comparator/main/querier.go Show resolved Hide resolved

robskillington reviewed Oct 25, 2019

View reviewed changes

src/cmd/services/m3comparator/main/querier.go Outdated Show resolved Hide resolved

arnikola added 4 commits October 25, 2019 15:09

PR response

da32a42

Fixes

3461b3c

Merge branch 'master' into arnikola/comparator

f447464

Merge branch 'master' into arnikola/comparator

516c369