Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It takes 2 hours for tikv failover #362

Closed
zyguan opened this issue Apr 1, 2019 · 8 comments · Fixed by #368
Closed

It takes 2 hours for tikv failover #362

zyguan opened this issue Apr 1, 2019 · 8 comments · Fixed by #368
Labels
test/stability stability tests type/bug Something isn't working

Comments

@zyguan
Copy link
Contributor

zyguan commented Apr 1, 2019

When a tikv is down, its state in pd firstly turns to Disconnected, then becomes Down after 1 hour. According to failover logic here, it will take 2 hours for failover. Does this behave as expected? It's little misleading.

@zyguan zyguan added the type/question Further information is requested label Apr 1, 2019
@weekface
Copy link
Contributor

weekface commented Apr 1, 2019

Now, it will failover after becoming Down for 1 hours. Did not consider the state of Disconnected.

@tennix @xiaojingchen PTAL

@tennix
Copy link
Member

tennix commented Apr 1, 2019

How will PD handle this when a TiKV fails especially when will PD begin to schedule data on the failed store? @nolouch

@nolouch
Copy link
Member

nolouch commented Apr 1, 2019

@zyguan Can you show the config of the PD? pd-ctl>> config show all

@zyguan
Copy link
Contributor Author

zyguan commented Apr 1, 2019

@nolouch here it is.

{
  "client-urls": "http://0.0.0.0:2379",
  "peer-urls": "http://0.0.0.0:2380",
  "advertise-client-urls": "http://demo-pd-1.demo-pd-peer.test-calico-ipip.svc:2379",
  "advertise-peer-urls": "http://demo-pd-1.demo-pd-peer.test-calico-ipip.svc:2380",
  "name": "demo-pd-1",
  "data-dir": "/var/lib/pd",
  "initial-cluster": "demo-pd-1=http://demo-pd-1.demo-pd-peer.test-calico-ipip.svc:2380",
  "initial-cluster-state": "new",
  "join": "",
  "lease": 3,
  "log": {
    "level": "info",
    "format": "text",
    "disable-timestamp": false,
    "file": {
      "filename": "",
      "log-rotate": true,
      "max-size": 0,
      "max-days": 0,
      "max-backups": 0
    }
  },
  "log-file": "",
  "log-level": "",
  "tso-save-interval": "3s",
  "metric": {
    "job": "demo-pd-1",
    "address": "",
    "interval": "15s"
  },
  "schedule": {
    "max-snapshot-count": 3,
    "max-pending-peer-count": 16,
    "max-merge-region-size": 0,
    "max-merge-region-keys": 0,
    "split-merge-interval": "1h0m0s",
    "patrol-region-interval": "100ms",
    "max-store-down-time": "1h0m0s",
    "leader-schedule-limit": 4,
    "region-schedule-limit": 4,
    "replica-schedule-limit": 8,
    "merge-schedule-limit": 8,
    "tolerant-size-ratio": 5,
    "low-space-ratio": 0.8,
    "high-space-ratio": 0.6,
    "disable-raft-learner": "false",
    "disable-remove-down-replica": "false",
    "disable-replace-offline-replica": "false",
    "disable-make-up-replica": "false",
    "disable-remove-extra-replica": "false",
    "disable-location-replacement": "false",
    "disable-namespace-relocation": "false",
    "schedulers-v2": [
      {
        "type": "balance-region",
        "args": null,
        "disable": false
      },
      {
        "type": "balance-leader",
        "args": null,
        "disable": false
      },
      {
        "type": "hot-region",
        "args": null,
        "disable": false
      },
      {
        "type": "label",
        "args": null,
        "disable": false
      }
    ]
  },
  "replication": {
    "max-replicas": 3,
    "location-labels": "zone,rack,host"
  },
  "namespace": {},
  "cluster-version": "2.1.3",
  "quota-backend-bytes": "0 B",
  "auto-compaction-mode": "periodic",
  "auto-compaction-retention-v2": "1h",
  "TickInterval": "500ms",
  "ElectionInterval": "3s",
  "PreVote": true,
  "security": {
    "cacert-path": "",
    "cert-path": "",
    "key-path": ""
  },
  "label-property": {},
  "WarningMsgs": null,
  "namespace-classifier": "table"
}

@weekface
Copy link
Contributor

weekface commented Apr 3, 2019

@nolouch PTAL

@weekface
Copy link
Contributor

weekface commented Apr 3, 2019

We should failover when the TiKV instance becomes Down. Do not need to wait another 1 hour. @zyguan

@weekface weekface added type/bug Something isn't working and removed type/question Further information is requested labels Apr 3, 2019
@zyguan
Copy link
Contributor Author

zyguan commented Apr 3, 2019

So, the failover should be triggered in a short time after pd.maxStoreDownTime, rather than 2*pd.maxStoreDownTime.

@weekface
Copy link
Contributor

weekface commented Apr 4, 2019

Yes

@weekface weekface added the test/stability stability tests label Apr 4, 2019
yahonda pushed a commit that referenced this issue Dec 27, 2021
* First commit of cleaned-up Get Started section

* Fixed formatting

* Fixes to Get Started and GKE tutorial

* Fixes to GKE tutorial

* Fixes to GKE tutorial

* Fixes to Get Started

* Added Grafana information and fixed some other Get Started items

* Fix TOC

* Update en/deploy-tidb-from-kubernetes-gke.md

Co-authored-by: DanielZhangQD <[email protected]>

* Revert "Update en/deploy-tidb-from-kubernetes-gke.md"

I accidentally applied this commit using the web interface.

This reverts commit 5bc072959a269726dfe5c7ff780608ce2617ed92.

* Update en/get-started.md

Co-authored-by: DanielZhangQD <[email protected]>

* Update en/get-started.md

Co-authored-by: DanielZhangQD <[email protected]>

* Change order of ops for tidb-operator install. Change wording and org of GKE tutorial.

* Fixed broken links

* Fixed markdown lint complaints

* Added an Upgrade section

* Added note about MySQL 8.0 client default-auth plugin.

* Fix md lint

* Fix md formatting

* Added note to kill kubectl port-forwarding

Co-authored-by: DanielZhangQD <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test/stability stability tests type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants