Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[m3query] suspicious timestamp causing prometheus remote read query fail #2471

Closed
DanielQujun opened this issue Jul 20, 2020 · 12 comments · Fixed by #2483
Closed

[m3query] suspicious timestamp causing prometheus remote read query fail #2471

DanielQujun opened this issue Jul 20, 2020 · 12 comments · Fixed by #2483
Assignees

Comments

@DanielQujun
Copy link

When I sent http://192.168.0.107:9090/api/v1/series?match[]={__name__=~%22hpa_container_.*%22,%20container_label_io_kubernetes_pod_namespace!=%22%22,%20container_label_io_kubernetes_pod_name!=%22%22} to prometheus which configured m3db as remote read storage. It returned

{"status":"success","data":[]}

But If I add start and end args in URL, It will return data, http://192.168.0.107:9090/api/v1/series?match[]={__name__=~%22hpa_container_.*%22,%20container_label_io_kubernetes_pod_namespace!=%22%22,%20container_label_io_kubernetes_pod_name!=%22%22}&start=1595227791.278&end=1595228091.278

I read the code of the Prometheus, It set the start as minTime and end as maxTime when these two parameters not defined in URL.
https://github.com/prometheus/prometheus/blob/master/web/api/v1/api.go#L549-L555

var (
	minTime = time.Unix(math.MinInt64/1000+62135596801, 0).UTC()
	maxTime = time.Unix(math.MaxInt64/1000-62135596801, 999999999).UTC()

	minTimeFormatted = minTime.Format(time.RFC3339Nano)
	maxTimeFormatted = maxTime.Format(time.RFC3339Nano)
)

Meanwhile, the m3query log shows

{"level":"debug","ts":1595228093.5400887,"msg":"query resolved cluster namespace, will use most granular per result","query":"","m3query":"conjunction(regexp(__name__, hpa_container_.*),negation(term(container_label_io_kubernetes_pod_namespace, )), negation(term(container_label_io_kubernetes_pod_name, )))","start":6795364580.679345,"end":-6795364579.680346,"fanoutType":"coversAllQueryRange","namespace":"default","type":"unaggregated","retention":"48h0m0s","resolution":"0s","remote":false}

the start and end shows in the log is very suspicious, looks like overflow, or convert failed.

@DanielQujun DanielQujun changed the title [m3query] suspicious timestamp causeing prometheus remote read query fail [m3query] suspicious timestamp causing prometheus remote read query fail Jul 20, 2020
@arnikola
Copy link
Collaborator

Trying to repro this locally but no luck. Which m3 version are you using and how exactly are you calling the query?

@DanielQujun
Copy link
Author

@arnikola HI, thanks for your reply, my m3 versions is v0.15.0

/ # /bin/m3coordinator --version
2020/07/21 01:22:41 Go Runtime version: go1.13.12
2020/07/21 01:22:41 Build Version:      v0.15.0
2020/07/21 01:22:41 Build Revision:     dcef4ff7b
2020/07/21 01:22:41 Build Branch:       HEAD
2020/07/21 01:22:41 Build Date:         2020-06-03-16:54:54
2020/07/21 01:22:41 Build TimeUnix:     1591203294

and the prometheus config is

global:
  scrape_interval: 10s
  scrape_timeout: 10s
  evaluation_interval: 10s
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 192.168.0.221:9093
    scheme: http
    timeout: 10s
    api_version: v1
remote_read:
- url: http://192.168.0.40:7201/api/v1/prom/remote/read
  remote_timeout: 1m
  read_recent: true

you might need to delete the Prometheus local data cache, cause Prometheus will merge the local results with remote.
I tried querying other labels like http://192.168.0.35:9090/api/v1/series?match[]={__name__=%22node_disk_io_now%22}, and returns no data either.

@DanielQujun
Copy link
Author

BTW, Prometheus version is: 2.12.2

@arnikola
Copy link
Collaborator

That's an old m3 version; you should update it. BTW if you change read_recent to false you won't get results from the prometheus cache

@DanielQujun
Copy link
Author

DanielQujun commented Jul 21, 2020

updated to v0.15.5, Get the same symptom still.

@arnikola
Copy link
Collaborator

Just verifying that this is resolved by that PR? I haven't been able to repro it locally as of yet; was going to ask if you have remote_write set up correctly in the prom settings

@DanielQujun
Copy link
Author

DanielQujun commented Jul 23, 2020

Hi, This is my remote_write config:

remote_write:
- url: http://192.168.0.220:7201/api/v1/prom/remote/write
  remote_timeout: 30s
  queue_config:
    capacity: 100000
    max_shards: 1000
    min_shards: 1
    max_samples_per_send: 100
    batch_send_deadline: 5s
    min_backoff: 30ms
    max_backoff: 100ms

I think the data should exist in m3db, I can get metrics data by Prometheus /api/v1/query and /api/v1/query_range API. also be able to get by /api/v1/series API if add start and end querystring in URL. only failed on without start or end,

@DanielQujun
Copy link
Author

And yes, that PR could resolve, But I'm not sure that's a good idea.:flushed:

@DanielQujun
Copy link
Author

FYI, there are my m3 configs, I'm not experienced with m3db, most of these were copied from m3 doc 😳 , As you suspected, that I might misconfigure something.:

  m3coordinator.yml: |
    listenAddress:
      type: "config"
      value: "0.0.0.0:7201"
    logging:
      level: info
    metrics:
      scope:
        prefix: "coordinator"
      prometheus:
        handlerPath: /metrics
        listenAddress: 0.0.0.0:7203
      sanitization: prometheus
      samplingRate: 1.0
      extended: none
    tagOptions:
      idScheme: quoted

    clusters:
    - namespaces:
          - namespace: default
            type: unaggregated
            retention: 48h
      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority
        config:
          service:
              env: default_env
              zone: embedded
              service: m3db
              cacheDir: /var/lib/m3kv
              etcdClusters:
              - zone: embedded
                endpoints:
                - https://192.168.0.28:4001
                - https://192.168.0.29:4001
                - https://192.168.0.31:4001
                tls:
                  crtPath: /etc/kubernetes/pki/server/server.pem
                  caCrtPath: /etc/kubernetes/pki/ca.pem
                  keyPath: /etc/kubernetes/pki/server/server-key.pem
---
  m3dbnode.yml: |
    db:
      logging:
        level: info

      metrics:
        prometheus:
          handlerPath: /metrics
        sanitization: prometheus
        samplingRate: 1.0
        extended: detailed

      listenAddress: 0.0.0.0:9005
      clusterListenAddress: 0.0.0.0:9006
      httpNodeListenAddress: 0.0.0.0:9002
      httpClusterListenAddress: 0.0.0.0:9003
      debugListenAddress: 0.0.0.0:9004

      hostID:
        resolver: environment
        envVarName: M3DB_HOST_ID

      client:
        writeConsistencyLevel: majority
        readConsistencyLevel: unstrict_majority

      gcPercentage: 100

      writeNewSeriesAsync: true
      writeNewSeriesLimitPerSecond: 1048576
      writeNewSeriesBackoffDuration: 2ms

      bootstrap:
        bootstrappers:
            - filesystem
            - commitlog
            - peers
            - uninitialized_topology
        commitlog:
          returnUnfulfilledForCorruptCommitLogFiles: false

      commitlog:
        flushMaxBytes: 524288
        flushEvery: 1s
        queue:
          calculationType: fixed
          size: 2097152

      fs:
        filePathPrefix: /var/lib/m3db

      config:
        service:
            env: default_env
            zone: embedded
            service: m3db
            cacheDir: /var/lib/m3kv
            etcdClusters:
            - zone: embedded
              endpoints:
              - https://192.168.0.28:4001
              - https://192.168.0.29:4001
              - https://192.168.0.31:4001
              tls:
                crtPath: /etc/kubernetes/pki/server/server.pem
                caCrtPath: /etc/kubernetes/pki/ca.pem
                keyPath: /etc/kubernetes/pki/server/server-key.pem
---
m3query-config.yaml: |
listenAddress:
  value: "0.0.0.0:7212"

logging:
  level: info

metrics:
  scope:
    prefix: "coordinator"
  prometheus:
    handlerPath: /metrics
    listenAddress: 0.0.0.0:7213 # until https://github.com/m3db/m3/issues/682 is resolved
  sanitization: prometheus
  samplingRate: 1.0
  extended: none

tagOptions:
  idScheme: quoted

clusters:
  - namespaces:
      - namespace: default
        type: unaggregated
        retention: 48h
    client:
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:
              - https://192.168.0.28:4001
              - https://192.168.0.29:4001
              - https://192.168.0.31:4001
              tls:
                crtPath: /etc/kubernetes/pki/server/server.pem 
                caCrtPath: /etc/kubernetes/pki/ca.pem
                keyPath: /etc/kubernetes/pki/server/server-key.pem
      writeConsistencyLevel: majority
      readConsistencyLevel: unstrict_majority
      writeTimeout: 10s
      fetchTimeout: 15s
      connectTimeout: 20s
      writeRetry:
        initialBackoff: 500ms
        backoffFactor: 3
        maxRetries: 2
        jitter: true
      fetchRetry:
        initialBackoff: 500ms
        backoffFactor: 2
        maxRetries: 3
        jitter: true
      backgroundHealthCheckFailLimit: 4
      backgroundHealthCheckFailThrottleFactor: 0.5

@arnikola
Copy link
Collaborator

arnikola commented Jul 23, 2020

Sorry it autoclosed, leaving this open until verified solved

@DanielQujun
Copy link
Author

Thanks for your great work! I'll verify it as soon.

@DanielQujun
Copy link
Author

verified, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants