Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Update return unfulfilled for corrupt commit log files default #2807

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions config/m3db/clustered-etcd/generated.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@
"tagOptions":
"idScheme": "quoted"
"db":
"bootstrap":
"commitlog":
"returnUnfulfilledForCorruptCommitLogFiles": false
"cache":
"postingsList":
"size": 262144
Expand Down
5 changes: 0 additions & 5 deletions config/m3db/clustered-etcd/m3dbnode.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,6 @@ function(cluster, coordinator={}, db={}) {
"writeNewSeriesAsync": true,
"writeNewSeriesLimitPerSecond": 1048576,
"writeNewSeriesBackoffDuration": "2ms",
"bootstrap": {
"commitlog": {
"returnUnfulfilledForCorruptCommitLogFiles": false
}
},
"cache": {
"series": {
"policy": "lru"
Expand Down
3 changes: 0 additions & 3 deletions config/m3db/local-etcd/generated.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@
"tagOptions":
"idScheme": "quoted"
"db":
"bootstrap":
"commitlog":
"returnUnfulfilledForCorruptCommitLogFiles": false
"cache":
"postingsList":
"size": 262144
Expand Down
5 changes: 0 additions & 5 deletions config/m3db/local-etcd/m3dbnode.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,6 @@ function(coordinator={}, db={}) {
"writeNewSeriesAsync": true,
"writeNewSeriesLimitPerSecond": 1048576,
"writeNewSeriesBackoffDuration": "2ms",
"bootstrap": {
"commitlog": {
"returnUnfulfilledForCorruptCommitLogFiles": false
}
},
"cache": {
"series": {
"policy": "lru"
Expand Down
4 changes: 0 additions & 4 deletions kube/bundle.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 0 additions & 4 deletions kube/m3dbnode-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,6 @@ data:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

commitlog:
flushMaxBytes: 524288
flushEvery: 1s
Expand Down
2 changes: 1 addition & 1 deletion kube/terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ resource "kubernetes_config_map" "m3dbnode_config" {
namespace = "m3db"
}
data {
m3dbnode.yml = "coordinator:\n listenAddress: \"0.0.0.0:7201\"\n local:\n namespaces:\n - namespace: default\n type: unaggregated\n retention: 48h\n metrics:\n scope:\n prefix: \"coordinator\"\n prometheus:\n handlerPath: /metrics\n listenAddress: 0.0.0.0:7203\n sanitization: prometheus\n samplingRate: 1.0\n extended: none\n tagOptions:\n idScheme: quoted\n\ndb:\n logging:\n level: info\n\n metrics:\n prometheus:\n handlerPath: /metrics\n sanitization: prometheus\n samplingRate: 1.0\n extended: detailed\n\n listenAddress: 0.0.0.0:9000\n clusterListenAddress: 0.0.0.0:9001\n httpNodeListenAddress: 0.0.0.0:9002\n httpClusterListenAddress: 0.0.0.0:9003\n debugListenAddress: 0.0.0.0:9004\n\n hostID:\n resolver: hostname\n\n client:\n writeConsistencyLevel: majority\n readConsistencyLevel: unstrict_majority\n\n gcPercentage: 100\n\n writeNewSeriesAsync: true\n writeNewSeriesLimitPerSecond: 1048576\n writeNewSeriesBackoffDuration: 2ms\n\n bootstrap:\n filesystem:\n numProcessorsPerCPU: 0.125\n commitlog:\n returnUnfulfilledForCorruptCommitLogFiles: false\n\n commitlog:\n flushMaxBytes: 524288\n flushEvery: 1s\n queue:\n calculationType: fixed\n size: 2097152\n\n filesystem:\n filePathPrefix: /var/lib/m3db\n\n config:\n service:\n env: default_env\n zone: embedded\n service: m3db\n cacheDir: /var/lib/m3kv\n etcdClusters:\n - zone: embedded\n endpoints:\n - http://etcd-0.etcd:2379\n - http://etcd-1.etcd:2379\n - http://etcd-2.etcd:2379\n"
m3dbnode.yml = "coordinator:\n listenAddress: \"0.0.0.0:7201\"\n local:\n namespaces:\n - namespace: default\n type: unaggregated\n retention: 48h\n metrics:\n scope:\n prefix: \"coordinator\"\n prometheus:\n handlerPath: /metrics\n listenAddress: 0.0.0.0:7203\n sanitization: prometheus\n samplingRate: 1.0\n extended: none\n tagOptions:\n idScheme: quoted\n\ndb:\n logging:\n level: info\n\n metrics:\n prometheus:\n handlerPath: /metrics\n sanitization: prometheus\n samplingRate: 1.0\n extended: detailed\n\n listenAddress: 0.0.0.0:9000\n clusterListenAddress: 0.0.0.0:9001\n httpNodeListenAddress: 0.0.0.0:9002\n httpClusterListenAddress: 0.0.0.0:9003\n debugListenAddress: 0.0.0.0:9004\n\n hostID:\n resolver: hostname\n\n client:\n writeConsistencyLevel: majority\n readConsistencyLevel: unstrict_majority\n\n gcPercentage: 100\n\n writeNewSeriesAsync: true\n writeNewSeriesLimitPerSecond: 1048576\n writeNewSeriesBackoffDuration: 2ms\n\n commitlog:\n flushMaxBytes: 524288\n flushEvery: 1s\n queue:\n calculationType: fixed\n size: 2097152\n\n filesystem:\n filePathPrefix: /var/lib/m3db\n\n config:\n service:\n env: default_env\n zone: embedded\n service: m3db\n cacheDir: /var/lib/m3kv\n etcdClusters:\n - zone: embedded\n endpoints:\n - http://etcd-0.etcd:2379\n - http://etcd-1.etcd:2379\n - http://etcd-2.etcd:2379\n"
}
}

Expand Down
4 changes: 0 additions & 4 deletions scripts/development/m3_stack/m3dbnode.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
4 changes: 0 additions & 4 deletions scripts/docker-integration-tests/repair/m3dbnode.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ This value can be set much lower than the default value for workloads in which a
### Ignoring Corrupt Commitlogs on Bootstrap

If M3DB is shut down gracefully (i.e via SIGTERM), it will ensure that all pending writes are flushed to the commitlog on disk before the process exists.
However, in situations where the process crashed/exited unexpectedly or the node itself experienced a sudden failure, the tail end of the commitlog may be corrupt.
However, in situations where SIGKILL is used, the process exited unexpectedly or the node itself experienced a sudden failure, the tail end of the commitlog may be corrupt.
In such situations, M3DB will read as much of the commitlog as possible in an attempt to recover the maximum amount of data. However, it then needs to make a decision: it can either **(a)** come up successfully and tolerate an ostensibly minor amount of data or loss, or **(b)** attempt to stream the missing data from its peers.
This behavior is controlled by the following default configuration:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
1 change: 1 addition & 0 deletions src/dbnode/config/m3dbnode-all-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ db:

bootstrap:
commitlog:
# Whether tail end of corrupted commit logs cause an error on bootstrap.
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
Expand Down
4 changes: 0 additions & 4 deletions src/dbnode/config/m3dbnode-cluster-template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
4 changes: 0 additions & 4 deletions src/dbnode/config/m3dbnode-local-etcd-proto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
4 changes: 0 additions & 4 deletions src/dbnode/config/m3dbnode-local-etcd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,6 @@ db:
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms

bootstrap:
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false

cache:
series:
policy: lru
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ const (
// DefaultReturnUnfulfilledForCorruptCommitLogFiles is the default
// value for whether to return unfulfilled when encountering corrupt
// commit log files.
DefaultReturnUnfulfilledForCorruptCommitLogFiles = true
DefaultReturnUnfulfilledForCorruptCommitLogFiles = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this the default we want?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)

var (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,8 @@ func TestCommitLogSourcePropCorrectlyBootstrapsFromCommitlog(t *testing.T) {
SetStrategy(commitlog.StrategyWriteBehind).
SetFlushInterval(time.Millisecond).
SetClockOptions(testCommitlogOpts.ClockOptions().SetNowFn(nowFn))
bootstrapOpts = testDefaultOpts.SetCommitLogOptions(commitLogOpts)
bootstrapOpts = testDefaultOpts.SetCommitLogOptions(commitLogOpts).
SetReturnUnfulfilledForCorruptCommitLogFiles(true)

start = input.currentTime.Truncate(blockSize)
)
Expand Down