Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start timestamp fall behind safe point #5709

Closed
adiu opened this issue Jan 24, 2018 · 12 comments
Closed

start timestamp fall behind safe point #5709

adiu opened this issue Jan 24, 2018 · 12 comments
Labels
type/question The issue belongs to a question.

Comments

@adiu
Copy link

adiu commented Jan 24, 2018

/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:446:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:84:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:106:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:299:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:370:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:74:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:594:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:76:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:551:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:497:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/statistics/handle.go:95:
2018/01/24 07:00:41.857 domain.go:596: [error] [stats] update stats info fail: /home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/error.go:26: start timestamp fall behind safe point
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:446:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:84:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:106:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:299:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:370:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:74:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:594:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:76:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:551:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:497:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/statistics/handle.go:95:
2018/01/24 07:00:42.064 server.go:118: [info] [172] new connection 192.168.0.202:50879
2018/01/24 07:00:42.066 session.go:1155: [error] Failed to load common global variables.
2018/01/24 07:00:42.066 tidb.go:165: [info] RollbackTxn for ddl/autocommit error.
2018/01/24 07:00:42.066 txn.go:184: [info] [kv] Rollback txn 397614053349654529
2018/01/24 07:00:42.066 session.go:665: [warning] [172] session error:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/error.go:26: start timestamp fall behind safe point
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:446:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:84:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:106:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:299:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:370:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:74:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:594:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:76:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:551:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:497:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:1156:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:1242:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:263:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:180:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/tidb.go:172:
{
"currDBName": "",
"id": 172,
"status": 2,
"strictMode": true,
"user": {
"Username": "www",
"Hostname": "192.168.0.202"
}
}
2018/01/24 07:00:42.066 server.go:284: [info] handshake error /home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/error.go:26: start timestamp fall behind safe point
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:446:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:84:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:106:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:299:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:370:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:74:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:594:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:76:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:551:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:497:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:1156:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:1242:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:263:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:180:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/tidb.go:172:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:667:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:747:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/server/conn.go:557:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/server/conn.go:371:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/server/conn.go:98:
2018/01/24 07:00:42.066 server.go:278: [info] [172] close connection
2018/01/24 07:00:44.857 domain.go:596: [error] [stats] update stats info fail: /home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/error.go:26: start timestamp fall behind safe point
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:446:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:84:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:106:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:299:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:370:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:74:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:594:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:76:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:551:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/session.go:497:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/statistics/handle.go:95:
2018/01/24 07:00:44.859 session.go:665: [warning] [0] session error:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/error.go:26: start timestamp fall behind safe point
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/coprocessor.go:446:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:84:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/distsql/new_distsql.go:106:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:299:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:370:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/distsql.go:74:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/new_distsql.go:594:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/write.go:1413:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/write.go:1344:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/executor/adapter.go:235:
/home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/tidb.go:172:
{
"currDBName": "",
"id": 0,
"status": 3,
"strictMode": true,
"txn": "397614054083657731",
"user": null
}

@tiancaiamao
Copy link
Contributor

What's your TiDB version? and what are you doing when you see this error log? @adiu

Seems it doesn't recover and TiDB refuse new connection? That's oddly enough @disksing

2018/01/24 07:00:42.066 server.go:284: [info] handshake error /home/jenkins/workspace/build_tidb_1.0/go/src/github.com/pingcap/tidb/store/tikv/error.go:26: start timestamp fall behind safe point

@adiu adiu closed this as completed Jan 25, 2018
@tiancaiamao
Copy link
Contributor

Is that happened in an old version TiDB or the problem can't reproduce? @adiu

./bin/tidb-server -V

would print the TiDB version.

Anyway, feel free to reopen the issue if you meet this problem again,
and we'll be appreciated if any further information can be provided for us to find potential bugs.
Thanks. @adiu

@adiu adiu reopened this Jan 25, 2018
@adiu
Copy link
Author

adiu commented Jan 25, 2018

./bin/tidb-server -V
Release Version: v1.0.4-1-ga05e014
Git Commit Hash: a05e0146bd7f085938d2631b3726cca7d102537e
Git Commit Branch: release-1.0
UTC Build Time:  2017-12-11 03:07:11

@tiancaiamao 不好意思,英文不好,没及时回复。
查看了代码,问题已经找到并解决。

问题出现是其中一台物理机时间没同步,出现时差。tidb节点在执行coprocessor.go:444 it.store.CheckVisibility(it.req.StartTs) 不能通过。
继续追查定位到是tidb-server 的 gcworker 的safe_point 与时间相关

解决方法比较粗暴,停止所有tidb-server。使用etcdctl 把/tidb/store/gcworker/saved_safe_point 删除再重启。
如果有更好的解决方案请指教。

@disksing
Copy link
Contributor

Hi @adiu , thanks for your report. We need some more information to help to reproduce/diagnose the problem:

  1. You have mentioned that one of your host's time is not synchronized. We need to know which process is deployed on that host, and whether its time is delayed or ahead of other hosts.
  2. In tidb's log, grep starts GC job, we can extract the GC history.
  3. Use a mysql client, run select * from mysql.tidb where variable_name like "tikv_gc%"; to display GC config.

@adiu
Copy link
Author

adiu commented Jan 25, 2018

@disksing
有问题的物理机快了几个小时的样子,这台机器部署了pd和tikv-server
starts GC job log:

2018/01/24 00:09:45.663 gc_worker.go:318: [info] [gc worker] 583a19b9ff80007 starts GC job, safePoint: 397615452425814016
2018/01/24 00:29:45.736 gc_worker.go:318: [info] [gc worker] 583a19b9ff80007 starts GC job, safePoint: 397615767011721216
2018/01/24 00:49:45.663 gc_worker.go:318: [info] [gc worker] 583a19b9ff80007 starts GC job, safePoint: 397616081571414016
2018/01/24 01:09:45.694 gc_worker.go:318: [info] [gc worker] 583a19b9ff80007 starts GC job, safePoint: 397616396144214016
2018/01/24 01:29:45.765 gc_worker.go:318: [info] [gc worker] 583a19b9ff80007 starts GC job, safePoint: 397616710743228416
2018/01/24 01:49:45.689 gc_worker.go:318: [info] [gc worker] 583a19b9ff80007 starts GC job, safePoint: 397617025289814016
2018/01/24 10:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397617353706700800
2018/01/24 11:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397617668279500800
2018/01/24 11:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397617982852300800
2018/01/24 11:40:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397618297425100800
2018/01/24 12:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397618611997900800
2018/01/24 12:20:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397618926557593600
2018/01/24 12:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397619241130393600
2018/01/24 13:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397619555703193600
2018/01/24 13:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397619870275993600
2018/01/24 13:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397620184848793600
2018/01/24 14:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397620499421593600
2018/01/24 14:20:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397620813994393600
2018/01/24 14:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397621128567193600
2018/01/24 15:00:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397621443139993600
2018/01/24 15:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397621757712793600
2018/01/24 15:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397622072285593600
2018/01/24 16:00:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397622386858393600
2018/01/24 16:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397622701431193600
2018/01/24 16:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397623016003993600
2018/01/24 17:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397623330576793600
2018/01/24 17:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397623645149593600
2018/01/24 17:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397623959722393600
2018/01/24 18:00:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397624274295193600
2018/01/24 18:20:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397624588854886400
2018/01/24 18:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397624903427686400
2018/01/24 19:00:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397625218000486400
2018/01/24 19:20:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397625532573286400
2018/01/24 19:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397625847146086400
2018/01/24 20:00:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397626161718886400
2018/01/24 20:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397626476291686400
2018/01/24 20:40:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397626790864486400
2018/01/24 21:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397627105437286400
2018/01/24 21:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397627420010086400
2018/01/24 21:40:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397627734582886400
2018/01/24 22:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397628049155686400
2018/01/24 22:20:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397628363728486400
2018/01/24 22:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397628678301286400
2018/01/24 23:00:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397628992874086400
2018/01/24 23:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397629307446886400
2018/01/24 23:40:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397629622019686400
2018/01/25 00:00:32.081 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397629936592486400
2018/01/25 00:20:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397630251152179200
2018/01/25 00:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397630565724979200
2018/01/25 01:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397630880297779200
2018/01/25 01:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397631194870579200
2018/01/25 01:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397631509443379200
2018/01/25 02:00:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397631824016179200
2018/01/25 02:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397632138588979200
2018/01/25 02:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397632453161779200
2018/01/25 03:00:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397632767734579200
2018/01/25 03:20:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397633082307379200
2018/01/25 03:40:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397633396880179200
2018/01/25 04:00:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397633711452979200
2018/01/25 04:20:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397634026025779200
2018/01/25 04:40:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397634340598579200
2018/01/25 05:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397634655171379200
2018/01/25 05:20:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397634969744179200
2018/01/25 05:40:32.082 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397635284303872000
2018/01/25 06:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397635598889779200
2018/01/25 06:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397635913462579200
2018/01/25 06:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397636228022272000
2018/01/25 07:00:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397636542595072000
2018/01/25 07:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397636857167872000
2018/01/25 07:40:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397637171740672000
2018/01/25 08:00:32.078 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397637486313472000
2018/01/25 08:20:32.079 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397637800886272000
2018/01/25 08:40:32.080 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397638115459072000
2018/01/25 09:00:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397638430109728768
2018/01/25 09:15:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397638666039328768
2018/01/25 09:35:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397638980612128768
2018/01/25 09:55:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397639295184928768
2018/01/25 10:15:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397639609757728768
2018/01/25 10:35:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397639924330528768
2018/01/25 10:55:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397640238903328768
2018/01/25 11:15:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397640553476128768
2018/01/25 11:35:32.077 gc_worker.go:318: [info] [gc worker] 5849deb05600006 starts GC job, safePoint: 397640868048928768

select * from mysql.tidb where variable_name like "tikv_gc%";

+-----------------------+-------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
| VARIABLE_NAME         | VARIABLE_VALUE                                                                                         | COMMENT                                                                                |
+-----------------------+--------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
| tikv_gc_leader_uuid   | 5849deb05600006                                                                                        | Current GC worker leader UUID. (DO NOT EDIT)                                           |
| tikv_gc_leader_desc   | host:localhost.localdomain, pid:12602, start at 2018-01-24 09:23:31.904688152 +0000 UTC m=+0.160498936 | Host name and pid of current GC leader. (DO NOT EDIT)                                  |
| tikv_gc_leader_lease  | 20180125-12:04:32 +0000 UTC                                                                            | Current GC worker leader lease. (DO NOT EDIT)                                          |
| tikv_gc_run_interval  | 10m0s                                                                                                  | GC run interval, at least 10m, in Go format.                                           |
| tikv_gc_life_time     | 10m0s                                                                                                  | All versions within life time will not be collected by GC, at least 10m, in Go format. |
| tikv_gc_last_run_time | 20180125-11:55:32 +0000 UTC                                                                            | The time when last GC starts. (DO NOT EDIT)                                            |
| tikv_gc_safe_point    | 20180125-11:45:32 +0000 UTC                                                                            | All versions after safe point can be accessed. (DO NOT EDIT)                           |
+-----------------------+--------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+

@disksing
Copy link
Contributor

Thanks for your feedback @adiu.
There could be something wrong with timestamp allocation with inconsistent time and time zone configurations. Please help to run the following commands on each pd server:

date +"%y-%m-%d %H:%M:%S %z"
grep -E "(sync and save timestamp)|(PD cluster leader)" pd*.log*

@adiu
Copy link
Author

adiu commented Jan 26, 2018

@disksing 时间已经调整同步了,已经没有意义了

grep -E "(sync and save timestamp)|(PD cluster leader)" pd*.log*

pd1:

pd-2018-01-24T11-02-54.181.log:2018/01/16 19:51:25.772 tso.go:104: [info] sync and save timestamp: last 2018-01-16 11:46:00.816257907 +0000 UTC save 2018-01-16 19:51:28.730234442 +0000 UTC m=+376553.3147  34271
pd-2018-01-24T11-02-54.181.log:2018/01/16 19:51:25.772 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/16 19:52:27.192 tso.go:104: [info] sync and save timestamp: last 2018-01-16 19:52:17.073090325 +0000 UTC save 2018-01-16 19:52:30.18167106 +0000 UTC m=+376614.76617  0918
pd-2018-01-24T11-02-54.181.log:2018/01/16 19:52:27.192 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/16 20:04:56.044 tso.go:104: [info] sync and save timestamp: last 2018-01-16 20:04:45.942251329 +0000 UTC save 2018-01-16 20:04:59.032757832 +0000 UTC m=+377363.6172  57659
pd-2018-01-24T11-02-54.181.log:2018/01/16 20:04:56.044 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/16 20:06:09.618 tso.go:104: [info] sync and save timestamp: last 2018-01-16 20:05:59.494834329 +0000 UTC save 2018-01-16 20:06:12.59139305 +0000 UTC m=+377437.17589  2877
pd-2018-01-24T11-02-54.181.log:2018/01/16 20:06:09.618 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/16 20:07:04.190 tso.go:104: [info] sync and save timestamp: last 2018-01-16 20:07:04.16838147 +0000 UTC save 2018-01-16 20:07:07.169513837 +0000 UTC m=+377491.75401  3817
pd-2018-01-24T11-02-54.181.log:2018/01/16 20:07:04.190 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/17 10:41:40.312 tso.go:104: [info] sync and save timestamp: last 2018-01-17 10:11:35.540276396 +0000 UTC save 2018-01-17 10:41:43.221848579 +0000 UTC m=+429967.8063  48405
pd-2018-01-24T11-02-54.181.log:2018/01/17 10:41:40.312 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/17 10:50:19.908 tso.go:104: [info] sync and save timestamp: last 2018-01-17 10:42:07.412558762 +0000 UTC save 2018-01-17 10:50:22.896462353 +0000 UTC m=+6.554625125
pd-2018-01-24T11-02-54.181.log:2018/01/17 10:50:19.908 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/17 12:53:47.429 tso.go:104: [info] sync and save timestamp: last 2018-01-17 12:15:10.958865751 +0000 UTC save 2018-01-17 12:53:50.391589669 +0000 UTC m=+238.6070465  77
pd-2018-01-24T11-02-54.181.log:2018/01/17 12:53:47.429 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/17 14:59:20.639 tso.go:104: [info] sync and save timestamp: last 2018-01-17 14:41:36.179524804 +0000 UTC save 2018-01-17 14:59:23.623890677 +0000 UTC m=+407.1062437  39
pd-2018-01-24T11-02-54.181.log:2018/01/17 14:59:20.639 leader.go:249: [info] PD cluster leader pd1 is ready to serve
pd-2018-01-24T11-02-54.181.log:2018/01/24 09:21:48.350 tso.go:104: [info] sync and save timestamp: last 2018-01-24 09:19:52.35713181 +0000 UTC save 2018-01-24 09:21:51.321101867 +0000 UTC m=+9.180898950
pd-2018-01-24T11-02-54.181.log:2018/01/24 09:21:48.350 leader.go:249: [info] PD cluster leader pd1 is ready to serve

pd2:

pd.log:2017/12/13 16:24:58.214 tso.go:104: [info] sync and save timestamp: last 0001-01-01 00:00:00 +0000 UTC save 2017-12-13 16:25:01.20352829 +0800 CST m=+9.281878754
pd.log:2017/12/13 16:24:58.214 leader.go:249: [info] PD cluster leader pd2 is ready to serve
pd.log:2018/01/12 03:05:13.456 tso.go:104: [info] sync and save timestamp: last 2018-01-12 02:35:10.502678804 +0000 UTC save 2018-01-12 03:05:16.439800993 +0000 UTC m=+638.736887628
pd.log:2018/01/12 03:05:13.456 leader.go:249: [info] PD cluster leader pd2 is ready to serve
pd.log:2018/01/12 03:15:47.516 tso.go:104: [info] sync and save timestamp: last 2018-01-12 03:13:35.8066631 +0000 UTC save 2018-01-12 03:15:50.499412772 +0000 UTC m=+11.024729017
pd.log:2018/01/12 03:15:47.516 leader.go:249: [info] PD cluster leader pd2 is ready to serve
pd.log:2018/01/24 06:32:10.087 tso.go:104: [info] sync and save timestamp: last 0001-01-01 00:00:00 +0000 UTC save 2018-01-24 06:32:13.059279678 +0000 UTC m=+5.358754345
pd.log:2018/01/24 06:32:10.087 leader.go:249: [info] PD cluster leader pd2 is ready to serve

pd3:

pd.log:2017/12/13 13:33:41.374 tso.go:104: [info] sync and save timestamp: last 2017-12-13 13:26:57.214976587 +0000 UTC save 2017-12-13 13:33:44.363415168 +0000 UTC m=+5.104879613
pd.log:2017/12/13 13:33:41.374 leader.go:249: [info] PD cluster leader pd3 is ready to serve
pd.log:2018/01/12 02:27:11.952 tso.go:104: [info] sync and save timestamp: last 2018-01-11 14:32:12.374889535 +0000 UTC save 2018-01-12 02:27:14.918647635 +0000 UTC m=+60.644332973
pd.log:2018/01/12 02:27:11.952 leader.go:249: [info] PD cluster leader pd3 is ready to serve
pd.log:2018/01/24 07:53:44.296 tso.go:104: [info] sync and save timestamp: last 2018-01-24 07:53:10.037574379 +0000 UTC save 2018-01-24 07:53:47.289334959 +0000 UTC m=+10.691696472
pd.log:2018/01/24 07:53:44.296 leader.go:249: [info] PD cluster leader pd3 is ready to serve
pd.log:2018/01/24 08:02:07.056 tso.go:104: [info] sync and save timestamp: last 2018-01-24 08:01:15.396717796 +0000 UTC save 2018-01-24 08:02:10.032121541 +0000 UTC m=+14.152752320
pd.log:2018/01/24 08:02:07.056 leader.go:249: [info] PD cluster leader pd3 is ready to serve
pd.log:2018/01/25 08:59:19.497 tso.go:104: [info] sync and save timestamp: last 2018-01-25 08:59:15.000584766 +0000 UTC save 2018-01-25 08:59:22.468427078 +0000 UTC m=+85071.897644150
pd.log:2018/01/25 08:59:19.497 leader.go:249: [info] PD cluster leader pd3 is ready to serve

@disksing
Copy link
Contributor

Hi @adiu , I have noticed this unusual log in pd2:

pd.log:2018/01/24 06:32:10.087 tso.go:104: [info] sync and save timestamp: last 0001-01-01 00:00:00 +0000 UTC save 2018-01-24 06:32:13.059279678 +0000 UTC m=+5.358754345

It seems that the pd was not able to load timestamp from etcd.
Did you reset and restart the cluster at this point in time? Or did you manually clean up the data the pd had saved in etcd?

@disksing
Copy link
Contributor

BTW, I'd like to know which pd is ahead of time. @adiu

@adiu
Copy link
Author

adiu commented Jan 30, 2018

@disksing 出问题的时候pd1的时间快了。调整时间后pd也是不能正常启动。我在另外一个问题里面有提到。后面查代码找到问题就用etcdctl把/pd/6498944785535040113/timestamp删除了。

@disksing
Copy link
Contributor

hi @adiu, the timestamp we saved in etcd is to ensure that the ts will never decrease, which is vital to ensure the correctness of transaction. Under normal circumstances, we do not want users to manually delete it... In your cluster, the rollbacked TS may result in some lost updates.
btw, we have an proposal to keep available even we meet clock drift problem. Please check tikv/pd#940

@zhexuany
Copy link
Contributor

zhexuany commented Jun 13, 2018

closing due to age, but feel free to reopen this.

@shenli shenli added the type/question The issue belongs to a question. label Jun 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question The issue belongs to a question.
Projects
None yet
Development

No branches or pull requests

5 participants