-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It takes 2 hours for tikv failover #362
Comments
Now, it will failover after becoming @tennix @xiaojingchen PTAL |
How will PD handle this when a TiKV fails especially when will PD begin to schedule data on the failed store? @nolouch |
@zyguan Can you show the config of the PD? pd-ctl>> config show all |
@nolouch here it is. {
"client-urls": "http://0.0.0.0:2379",
"peer-urls": "http://0.0.0.0:2380",
"advertise-client-urls": "http://demo-pd-1.demo-pd-peer.test-calico-ipip.svc:2379",
"advertise-peer-urls": "http://demo-pd-1.demo-pd-peer.test-calico-ipip.svc:2380",
"name": "demo-pd-1",
"data-dir": "/var/lib/pd",
"initial-cluster": "demo-pd-1=http://demo-pd-1.demo-pd-peer.test-calico-ipip.svc:2380",
"initial-cluster-state": "new",
"join": "",
"lease": 3,
"log": {
"level": "info",
"format": "text",
"disable-timestamp": false,
"file": {
"filename": "",
"log-rotate": true,
"max-size": 0,
"max-days": 0,
"max-backups": 0
}
},
"log-file": "",
"log-level": "",
"tso-save-interval": "3s",
"metric": {
"job": "demo-pd-1",
"address": "",
"interval": "15s"
},
"schedule": {
"max-snapshot-count": 3,
"max-pending-peer-count": 16,
"max-merge-region-size": 0,
"max-merge-region-keys": 0,
"split-merge-interval": "1h0m0s",
"patrol-region-interval": "100ms",
"max-store-down-time": "1h0m0s",
"leader-schedule-limit": 4,
"region-schedule-limit": 4,
"replica-schedule-limit": 8,
"merge-schedule-limit": 8,
"tolerant-size-ratio": 5,
"low-space-ratio": 0.8,
"high-space-ratio": 0.6,
"disable-raft-learner": "false",
"disable-remove-down-replica": "false",
"disable-replace-offline-replica": "false",
"disable-make-up-replica": "false",
"disable-remove-extra-replica": "false",
"disable-location-replacement": "false",
"disable-namespace-relocation": "false",
"schedulers-v2": [
{
"type": "balance-region",
"args": null,
"disable": false
},
{
"type": "balance-leader",
"args": null,
"disable": false
},
{
"type": "hot-region",
"args": null,
"disable": false
},
{
"type": "label",
"args": null,
"disable": false
}
]
},
"replication": {
"max-replicas": 3,
"location-labels": "zone,rack,host"
},
"namespace": {},
"cluster-version": "2.1.3",
"quota-backend-bytes": "0 B",
"auto-compaction-mode": "periodic",
"auto-compaction-retention-v2": "1h",
"TickInterval": "500ms",
"ElectionInterval": "3s",
"PreVote": true,
"security": {
"cacert-path": "",
"cert-path": "",
"key-path": ""
},
"label-property": {},
"WarningMsgs": null,
"namespace-classifier": "table"
} |
@nolouch PTAL |
We should failover when the TiKV instance becomes |
So, the failover should be triggered in a short time after |
Yes |
* First commit of cleaned-up Get Started section * Fixed formatting * Fixes to Get Started and GKE tutorial * Fixes to GKE tutorial * Fixes to GKE tutorial * Fixes to Get Started * Added Grafana information and fixed some other Get Started items * Fix TOC * Update en/deploy-tidb-from-kubernetes-gke.md Co-authored-by: DanielZhangQD <[email protected]> * Revert "Update en/deploy-tidb-from-kubernetes-gke.md" I accidentally applied this commit using the web interface. This reverts commit 5bc072959a269726dfe5c7ff780608ce2617ed92. * Update en/get-started.md Co-authored-by: DanielZhangQD <[email protected]> * Update en/get-started.md Co-authored-by: DanielZhangQD <[email protected]> * Change order of ops for tidb-operator install. Change wording and org of GKE tutorial. * Fixed broken links * Fixed markdown lint complaints * Added an Upgrade section * Added note about MySQL 8.0 client default-auth plugin. * Fix md lint * Fix md formatting * Added note to kill kubectl port-forwarding Co-authored-by: DanielZhangQD <[email protected]>
When a tikv is down, its state in pd firstly turns to
Disconnected
, then becomesDown
after 1 hour. According to failover logic here, it will take 2 hours for failover. Does this behave as expected? It's little misleading.The text was updated successfully, but these errors were encountered: