[m3query] suspicious timestamp causing prometheus remote read query fail #2471

DanielQujun · 2020-07-20T08:30:44Z

When I sent http://192.168.0.107:9090/api/v1/series?match[]={__name__=~%22hpa_container_.*%22,%20container_label_io_kubernetes_pod_namespace!=%22%22,%20container_label_io_kubernetes_pod_name!=%22%22} to prometheus which configured m3db as remote read storage. It returned

{"status":"success","data":[]}

But If I add start and end args in URL, It will return data, http://192.168.0.107:9090/api/v1/series?match[]={__name__=~%22hpa_container_.*%22,%20container_label_io_kubernetes_pod_namespace!=%22%22,%20container_label_io_kubernetes_pod_name!=%22%22}&start=1595227791.278&end=1595228091.278

I read the code of the Prometheus, It set the start as minTime and end as maxTime when these two parameters not defined in URL.
https://github.com/prometheus/prometheus/blob/master/web/api/v1/api.go#L549-L555

var (
	minTime = time.Unix(math.MinInt64/1000+62135596801, 0).UTC()
	maxTime = time.Unix(math.MaxInt64/1000-62135596801, 999999999).UTC()

	minTimeFormatted = minTime.Format(time.RFC3339Nano)
	maxTimeFormatted = maxTime.Format(time.RFC3339Nano)
)

Meanwhile, the m3query log shows

{"level":"debug","ts":1595228093.5400887,"msg":"query resolved cluster namespace, will use most granular per result","query":"","m3query":"conjunction(regexp(__name__, hpa_container_.*),negation(term(container_label_io_kubernetes_pod_namespace, )), negation(term(container_label_io_kubernetes_pod_name, )))","start":6795364580.679345,"end":-6795364579.680346,"fanoutType":"coversAllQueryRange","namespace":"default","type":"unaggregated","retention":"48h0m0s","resolution":"0s","remote":false}

the start and end shows in the log is very suspicious, looks like overflow, or convert failed.

The text was updated successfully, but these errors were encountered:

arnikola · 2020-07-20T22:17:46Z

Trying to repro this locally but no luck. Which m3 version are you using and how exactly are you calling the query?

DanielQujun · 2020-07-21T01:48:58Z

@arnikola HI, thanks for your reply, my m3 versions is v0.15.0

/ # /bin/m3coordinator --version
2020/07/21 01:22:41 Go Runtime version: go1.13.12
2020/07/21 01:22:41 Build Version:      v0.15.0
2020/07/21 01:22:41 Build Revision:     dcef4ff7b
2020/07/21 01:22:41 Build Branch:       HEAD
2020/07/21 01:22:41 Build Date:         2020-06-03-16:54:54
2020/07/21 01:22:41 Build TimeUnix:     1591203294

and the prometheus config is

global:
  scrape_interval: 10s
  scrape_timeout: 10s
  evaluation_interval: 10s
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 192.168.0.221:9093
    scheme: http
    timeout: 10s
    api_version: v1
remote_read:
- url: http://192.168.0.40:7201/api/v1/prom/remote/read
  remote_timeout: 1m
  read_recent: true

you might need to delete the Prometheus local data cache, cause Prometheus will merge the local results with remote.
I tried querying other labels like http://192.168.0.35:9090/api/v1/series?match[]={__name__=%22node_disk_io_now%22}, and returns no data either.

DanielQujun · 2020-07-21T01:56:06Z

BTW, Prometheus version is: 2.12.2

arnikola · 2020-07-21T10:55:22Z

That's an old m3 version; you should update it. BTW if you change read_recent to false you won't get results from the prometheus cache

DanielQujun · 2020-07-21T15:06:01Z

updated to v0.15.5, Get the same symptom still.

arnikola · 2020-07-22T17:20:45Z

Just verifying that this is resolved by that PR? I haven't been able to repro it locally as of yet; was going to ask if you have remote_write set up correctly in the prom settings

DanielQujun · 2020-07-23T01:47:38Z

Hi, This is my remote_write config:

remote_write:
- url: http://192.168.0.220:7201/api/v1/prom/remote/write
  remote_timeout: 30s
  queue_config:
    capacity: 100000
    max_shards: 1000
    min_shards: 1
    max_samples_per_send: 100
    batch_send_deadline: 5s
    min_backoff: 30ms
    max_backoff: 100ms

I think the data should exist in m3db, I can get metrics data by Prometheus /api/v1/query and /api/v1/query_range API. also be able to get by /api/v1/series API if add start and end querystring in URL. only failed on without start or end,

DanielQujun · 2020-07-23T01:55:28Z

And yes, that PR could resolve, But I'm not sure that's a good idea.:flushed:

DanielQujun · 2020-07-23T02:16:22Z

FYI, there are my m3 configs, I'm not experienced with m3db, most of these were copied from m3 doc 😳 , As you suspected, that I might misconfigure something.:

  m3coordinator.yml: |
    listenAddress:
      type: "config"
      value: "0.0.0.0:7201"
    logging:
      level: info
    metrics:
      scope:
        prefix: "coordinator"
      prometheus:
        handlerPath: /metrics
        listenAddress: 0.0.0.0:7203
      sanitization: prometheus
      samplingRate: 1.0
      extended: none
    tagOptions:
      idScheme: quoted

    clusters:
    - namespaces:
          - namespace: default
            type: unaggregated
            retention: 48h
      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority
        config:
          service:
              env: default_env
              zone: embedded
              service: m3db
              cacheDir: /var/lib/m3kv
              etcdClusters:
              - zone: embedded
                endpoints:
                - https://192.168.0.28:4001
                - https://192.168.0.29:4001
                - https://192.168.0.31:4001
                tls:
                  crtPath: /etc/kubernetes/pki/server/server.pem
                  caCrtPath: /etc/kubernetes/pki/ca.pem
                  keyPath: /etc/kubernetes/pki/server/server-key.pem
---
  m3dbnode.yml: |
    db:
      logging:
        level: info

      metrics:
        prometheus:
          handlerPath: /metrics
        sanitization: prometheus
        samplingRate: 1.0
        extended: detailed

      listenAddress: 0.0.0.0:9005
      clusterListenAddress: 0.0.0.0:9006
      httpNodeListenAddress: 0.0.0.0:9002
      httpClusterListenAddress: 0.0.0.0:9003
      debugListenAddress: 0.0.0.0:9004

      hostID:
        resolver: environment
        envVarName: M3DB_HOST_ID

      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority

      gcPercentage: 100

      writeNewSeriesAsync: true
      writeNewSeriesLimitPerSecond: 1048576
      writeNewSeriesBackoffDuration: 2ms

      bootstrap:
        bootstrappers:
            - filesystem
            - commitlog
            - peers
            - uninitialized_topology
        commitlog:
          returnUnfulfilledForCorruptCommitLogFiles: false

      commitlog:
        flushMaxBytes: 524288
        flushEvery: 1s
        queue:
          calculationType: fixed
          size: 2097152

      fs:
        filePathPrefix: /var/lib/m3db

      config:
        service:
            env: default_env
            zone: embedded
            service: m3db
            cacheDir: /var/lib/m3kv
            etcdClusters:
            - zone: embedded
              endpoints:
              - https://192.168.0.28:4001
              - https://192.168.0.29:4001
              - https://192.168.0.31:4001
              tls:
                crtPath: /etc/kubernetes/pki/server/server.pem
                caCrtPath: /etc/kubernetes/pki/ca.pem
                keyPath: /etc/kubernetes/pki/server/server-key.pem
---
m3query-config.yaml: |
listenAddress:
  value: "0.0.0.0:7212"

logging:
  level: info

metrics:
  scope:
    prefix: "coordinator"
  prometheus:
    handlerPath: /metrics
    listenAddress: 0.0.0.0:7213 # until https://github.com/m3db/m3/issues/682 is resolved
  sanitization: prometheus
  samplingRate: 1.0
  extended: none

tagOptions:
  idScheme: quoted

clusters:
  - namespaces:
      - namespace: default
        type: unaggregated
        retention: 48h
    client:
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
              - https://192.168.0.28:4001
              - https://192.168.0.29:4001
              - https://192.168.0.31:4001
              tls:
                crtPath: /etc/kubernetes/pki/server/server.pem 
                caCrtPath: /etc/kubernetes/pki/ca.pem
                keyPath: /etc/kubernetes/pki/server/server-key.pem
      writeConsistencyLevel: majority
      readConsistencyLevel: unstrict_majority
      writeTimeout: 10s
      fetchTimeout: 15s
      connectTimeout: 20s
      writeRetry:
        initialBackoff: 500ms
        backoffFactor: 3
        maxRetries: 2
        jitter: true
      fetchRetry:
        initialBackoff: 500ms
        backoffFactor: 2
        maxRetries: 3
        jitter: true
      backgroundHealthCheckFailLimit: 4
      backgroundHealthCheckFailThrottleFactor: 0.5

arnikola · 2020-07-23T21:23:27Z

Sorry it autoclosed, leaving this open until verified solved

DanielQujun · 2020-07-24T01:47:30Z

Thanks for your great work! I'll verify it as soon.

DanielQujun · 2020-07-24T02:17:33Z

verified, closing.

DanielQujun changed the title ~~[m3query] suspicious timestamp causeing prometheus remote read query fail~~ [m3query] suspicious timestamp causing prometheus remote read query fail Jul 20, 2020

gibbscullen assigned arnikola Jul 21, 2020

DanielQujun mentioned this issue Jul 22, 2020

add endTime variable for promeClient.Series api kubernetes-sigs/prometheus-adapter#304

Closed

arnikola mentioned this issue Jul 23, 2020

[query] Auto-set reasonable range on remote match #2483

Merged

arnikola closed this as completed in #2483 Jul 23, 2020

arnikola reopened this Jul 23, 2020

robskillington mentioned this issue Jul 24, 2020

[query] Parse programmatic min/max time as [start_retention, end_retention) #2487

Merged

DanielQujun closed this as completed Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[m3query] suspicious timestamp causing prometheus remote read query fail #2471

[m3query] suspicious timestamp causing prometheus remote read query fail #2471

DanielQujun commented Jul 20, 2020

arnikola commented Jul 20, 2020

DanielQujun commented Jul 21, 2020

DanielQujun commented Jul 21, 2020

arnikola commented Jul 21, 2020

DanielQujun commented Jul 21, 2020 •

edited

Loading

arnikola commented Jul 22, 2020

DanielQujun commented Jul 23, 2020 •

edited

Loading

DanielQujun commented Jul 23, 2020

DanielQujun commented Jul 23, 2020

arnikola commented Jul 23, 2020 •

edited

Loading

DanielQujun commented Jul 24, 2020

DanielQujun commented Jul 24, 2020

[m3query] suspicious timestamp causing prometheus remote read query fail #2471

[m3query] suspicious timestamp causing prometheus remote read query fail #2471

Comments

DanielQujun commented Jul 20, 2020

arnikola commented Jul 20, 2020

DanielQujun commented Jul 21, 2020

DanielQujun commented Jul 21, 2020

arnikola commented Jul 21, 2020

DanielQujun commented Jul 21, 2020 • edited Loading

arnikola commented Jul 22, 2020

DanielQujun commented Jul 23, 2020 • edited Loading

DanielQujun commented Jul 23, 2020

DanielQujun commented Jul 23, 2020

arnikola commented Jul 23, 2020 • edited Loading

DanielQujun commented Jul 24, 2020

DanielQujun commented Jul 24, 2020

DanielQujun commented Jul 21, 2020 •

edited

Loading

DanielQujun commented Jul 23, 2020 •

edited

Loading

arnikola commented Jul 23, 2020 •

edited

Loading