values are out of order from inner iterator after changing blockSize #2789

edgarasg · 2020-10-22T08:09:33Z

General Issues

After changing blockSize from 24h to 3h (deleted namespace, created new one with same name, but different blockSize and restarted all dbnodes one by one) we are now getting values are out of order from inner iterator errors on reads from Prometheus. Our m3coordinator instances are load balanced using haproxy.

Also, after changing blockSize and restarting dbnode we got this error:

m3dbnode[14558]: {"level":"error","ts":1603308853.9480052,"msg":"unknown error","bootstrapper":"filesystem","error":"fulfilled range {26: 24h0m0s} is outside of index block range: (2020-10-15 00:00:00 +0000 UTC,2020-10-15 03:00:00 +0000 UTC)","timeRangeStart":1602720000}

What service is experiencing the issue? (M3Coordinator, M3DB, M3Aggregator, etc)

M3DB

What is the configuration of the service? Please include any YAML files, as well as namespace / placement configuration (with any sensitive information anonymized if necessary).

cat /etc/m3db/dbnode.yml

db:
  logging:
    level: info

  metrics:
    prometheus:
      handlerPath: /metrics
      listenAddress: '[::]:7202'
    sanitization: prometheus
    samplingRate: 1.0
    extended: detailed

  hostID:
    resolver: hostname

  config:
    service:
      env: default_env
      zone: embedded
      service: m3db
      cacheDir: /var/lib/m3kv
      etcdClusters:
        - zone: embedded
          endpoints:

  listenAddress: '[::]:9000'
  clusterListenAddress: '[::]:9001'
  httpNodeListenAddress: '[::]:9002'
  httpClusterListenAddress: '[::]:9003'

  client:
    writeConsistencyLevel: majority
    readConsistencyLevel: unstrict_majority

  gcPercentage: 100

  writeNewSeriesAsync: true
  writeNewSeriesLimitPerSecond: 10000
  writeNewSeriesBackoffDuration: 2ms

  bootstrap:
    bootstrappers:
      - filesystem
      - commitlog
      - peers
      - uninitialized_topology
    commitlog:
      returnUnfulfilledForCorruptCommitLogFiles: false

  cache:
    series:
      policy: none
    postingsList:
      size: 262144

  commitlog:
    flushMaxBytes: 524288
    flushEvery: 1s
    queue:
      calculationType: fixed
      size: 2097152

  fs:
    filePathPrefix: /data/m3

cat /etc/m3db/coordinator.yml

listenAddress:
  value: '[::]:7201'

logging:
  level: info

metrics:
  scope:
    prefix: coordinator
  prometheus:
    handlerPath: /metrics
    listenAddress: '[::]:7203'
  sanitization: prometheus
  samplingRate: 1.0
  extended: none

tagOptions:
  idScheme: quoted

clusters:
  - namespaces:
      - namespace: 31d_prometheus
        retention: 744h
        type: unaggregated
    client:
      config:
        service:
          env: default_env
          zone: embedded
          service: m3db
          cacheDir: /var/lib/m3kv
          etcdClusters:
            - zone: embedded
              endpoints:

      writeConsistencyLevel: majority
      readConsistencyLevel: unstrict_majority

{
  "registry": {
    "namespaces": {
      "31d_prometheus": {
        "aggregationOptions": null,
        "bootstrapEnabled": true,
        "cacheBlocksOnRetrieve": true,
        "cleanupEnabled": true,
        "coldWritesEnabled": false,
        "extendedOptions": null,
        "flushEnabled": true,
        "indexOptions": {
          "blockSizeDuration": "3h0m0s",
          "enabled": true
        },
        "repairEnabled": false,
        "retentionOptions": {
          "blockDataExpiry": true,
          "blockDataExpiryAfterNotAccessPeriodDuration": "5m0s",
          "blockSizeDuration": "3h0m0s",
          "bufferFutureDuration": "2m0s",
          "bufferPastDuration": "10m0s",
          "futureRetentionPeriodDuration": "0s",
          "retentionPeriodDuration": "744h0m0s"
        },
        "runtimeOptions": null,
        "schemaOptions": null,
        "snapshotEnabled": true,
        "writesToCommitLog": true
      }
    }
  }
}

How are you using the service? For example, are you performing read/writes to the service via Prometheus, or are you using a custom script?

We are using one Prometheus remote_write and remote_read.

Is there a reliable way to reproduce the behavior? If so, please provide detailed instructions.

I suppose changing blockSize on existing namespace.

The text was updated successfully, but these errors were encountered:

wang1219 · 2020-12-24T11:36:45Z

The same iuess, How to solve it？

wang1219 · 2020-12-25T04:29:22Z

@gibbscullen @arnikola Is there any conclusion or solution to this iuess? At present, my cluster cannot find data if the query time exceeds 24 hours.

I changed the blockSizeDuration from 8h to 4h and restarted all m3db nodes. When I found this problem, I changed the blockSizeDuration back to 8h. This problem did not solve. Help, thanks.

gibbscullen · 2020-12-28T15:02:51Z

@edgarasg are you able to share what resolved this issue for you?

edgarasg · 2020-12-28T15:27:46Z

Nothing, I've had to delete all data in the cluster and start over.

…

On Mon, 28 Dec 2020, 17:03 Gibbs Cullen, ***@***.***> wrote: @edgarasg <https://github.com/edgarasg> are you able to share what resolved this issue for you? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2789 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD6PKSCDM3BDJVW5IU2UWTDSXCM2XANCNFSM4S22YNVQ> .

linasm · 2021-01-18T08:00:28Z

@edgarasg, @wang1219 it is not possible to change the block size of an existing namespace: https://m3db.io/docs/operational_guide/namespace_configuration/#modifying-a-namespace
The reason is that block size is a fundamental parameter controlling the internal layout of encoded data at a very deep level.

edgarasg · 2021-01-18T08:18:45Z

That is what i expected. We can close this issue for now i suppose.

wang1219 · 2021-01-18T08:37:21Z

@linasm I modified the blacksize because of another issue https://github.com/m3db/m3/issues/3032, I want the cluster to update the index faster.

gibbscullen assigned gibbscullen and arnikola Oct 22, 2020

linasm closed this as completed Jan 18, 2021

ChrisChinchilla mentioned this issue Jan 18, 2021

[DOCS] Add warning to changing blocksize #3096

Merged

wang1219 mentioned this issue Feb 4, 2021

M3db query is too slow #3032

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

values are out of order from inner iterator after changing blockSize #2789

values are out of order from inner iterator after changing blockSize #2789

edgarasg commented Oct 22, 2020 •

edited

Loading

wang1219 commented Dec 24, 2020

wang1219 commented Dec 25, 2020

gibbscullen commented Dec 28, 2020

edgarasg commented Dec 28, 2020 via email

linasm commented Jan 18, 2021

edgarasg commented Jan 18, 2021

wang1219 commented Jan 18, 2021

values are out of order from inner iterator after changing blockSize #2789

values are out of order from inner iterator after changing blockSize #2789

Comments

edgarasg commented Oct 22, 2020 • edited Loading

General Issues

wang1219 commented Dec 24, 2020

wang1219 commented Dec 25, 2020

gibbscullen commented Dec 28, 2020

edgarasg commented Dec 28, 2020 via email

linasm commented Jan 18, 2021

edgarasg commented Jan 18, 2021

wang1219 commented Jan 18, 2021

edgarasg commented Oct 22, 2020 •

edited

Loading