Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drainer报错: filterTable failed: not found table id: 49499 #1143

Open
snow-with-tea opened this issue Mar 17, 2022 · 0 comments
Open

drainer报错: filterTable failed: not found table id: 49499 #1143

snow-with-tea opened this issue Mar 17, 2022 · 0 comments

Comments

@snow-with-tea
Copy link

【 TiDB 使用环境】
生产环境

【概述】 场景 + 问题概述
未做配置调整,突然开始频繁报错退出

【备份和数据迁移策略逻辑】
pump – drainer – kafka

【背景】 做过哪些操作
无运维操作(不排除有数据库脏数据因素)

【现象】 业务和数据库现象
同步中断

【问题】 当前遇到的问题
已经异常退出多次,报错是not found table id,但是根据table id有时找不到表,有时找能到表

找不到时:
curl http://tidb-server01:10080/schema?table_id=49499
[schema:1146]Table which ID = 49499 does not exist.

pump未发现ERROR级报错
drainer 重启即可恢复

【业务影响】
影响数据同步

【TiDB 版本】

v4.0.4

【drainer日志】

[2022/03/17 00:35:31.084 +08:00] [INFO] [syncer.go:257] ["write save point"] [ts=431868745644769288]
[2022/03/17 00:36:00.234 +08:00] [INFO] [syncer.go:257] ["write save point"] [ts=431868753509089288]
[2022/03/17 00:36:26.977 +08:00] [INFO] [syncer.go:257] ["write save point"] [ts=431868761386516491]
[2022/03/17 00:36:56.325 +08:00] [INFO] [syncer.go:257] ["write save point"] [ts=431868769250836489]
[2022/03/17 00:37:26.923 +08:00] [INFO] [syncer.go:257] ["write save point"] [ts=431868777128263681]
[2022/03/17 00:37:27.128 +08:00] [INFO] [async_producer.go:1015] ["[sarama] Producer shutting down."]
[2022/03/17 00:37:27.128 +08:00] [INFO] [client.go:227] ["[sarama] Closing Client"]
[2022/03/17 00:37:27.128 +08:00] [INFO] [syncer.go:249] ["handleSuccess quit"]
[2022/03/17 00:37:27.128 +08:00] [INFO] [async_producer.go:717] ["[sarama] producer/broker/1 input chan closed\n"]
[2022/03/17 00:37:27.128 +08:00] [INFO] [broker.go:253] ["[sarama] Closed connection to broker 172.16.16.221:9092\n"]
[2022/03/17 00:37:27.128 +08:00] [INFO] [async_producer.go:801] ["[sarama] producer/broker/1 shut down\n"]
[2022/03/17 00:37:27.128 +08:00] [ERROR] [server.go:289] ["syncer exited abnormal"] [error="filterTable failed: not found table id: 49567"] [errorVerbose="not found table id: 49567\ngithub.com/pingcap/tidb-binlog/drainer.filterTable\n\t/Users/weizheng/go/src/github.com/letusgoing/tidb-binlog/drainer/syncer.go:514\ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer).run\n\t/Users/weizheng/go/src/github.com/letusgoing/tidb-binlog/drainer/syncer.go:368\ngithub.com/pingcap/tidb-binlog/drainer.(*Syncer).Start\n\t/Users/weizheng/go/src/github.com/letusgoing/tidb-binlog/drainer/syncer.go:132\ngithub.com/pingcap/tidb-binlog/drainer.(*Server).Start.func4\n\t/Users/weizheng/go/src/github.com/letusgoing/tidb-binlog/drainer/server.go:288\ngithub.com/pingcap/tidb-binlog/drainer.(*taskGroup).start.func1\n\t/Users/weizheng/go/src/github.com/letusgoing/tidb-binlog/drainer/util.go:71\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374\nfilterTable failed"]
[2022/03/17 00:37:27.128 +08:00] [INFO] [broker.go:253] ["[sarama] Closed connection to broker 172.16.18.27:9092\n"]
[2022/03/17 00:37:27.128 +08:00] [INFO] [util.go:68] [Exit] [name=syncer]
[2022/03/17 00:37:27.128 +08:00] [INFO] [server.go:451] ["begin to close drainer server"]
[2022/03/17 00:37:27.132 +08:00] [INFO] [server.go:416] ["has already update status"] [id=172.16.18.27:8249]
[2022/03/17 00:37:27.132 +08:00] [INFO] [server.go:455] ["commit status done"]
[2022/03/17 00:37:27.132 +08:00] [INFO] [pump.go:77] ["pump is closing"] [id=172.16.16.221:8250]
[2022/03/17 00:37:27.132 +08:00] [INFO] [util.go:68] [Exit] [name=heartbeat]
[2022/03/17 00:37:27.132 +08:00] [INFO] [pump.go:77] ["pump is closing"] [id=172.16.17.65:8250]
[2022/03/17 00:37:27.132 +08:00] [INFO] [pump.go:77] ["pump is closing"] [id=172.16.18.27:8250]
[2022/03/17 00:37:27.132 +08:00] [INFO] [collector.go:135] ["publishBinlogs quit"]
[2022/03/17 00:37:27.132 +08:00] [INFO] [util.go:68] [Exit] [name=collect]
[2022/03/17 00:37:27.132 +08:00] [INFO] [main.go:73] ["drainer exit"]

【2022/03/17 00:37:27 左右tidb-server的相关日志】

tidb-server01: (根据table id 49567 搜索)

[2022/03/17 00:37:25.175 +08:00] [ERROR] [conn.go:730] ["command dispatched failed"] [conn=87426831] [connInfo="id:87426831, addr:10.0.4.48:61564 status:0, collation:utf8_general_ci, user:dev_grid"] [command=Query] [status="inTxn:0, autocommit:0"] [sql=commit] [txn_mode=OPTIMISTIC] [err="previous statement: insert into clerk_bonus_his (ID, CLERK_CODE, BONUS_DATE,        BONUS, CREATED_NAME, CREATED_DATE,        UPDATED_NAME, UPDATED_DATE, IS_EFFECTIVE,        BONUS_YEAR, BONUS_MONTH, BONUS_DAY,        PRODUCT_CODE, COLOR, `SIZE`,        `COUNT`, SALES_AMOUNT, CLERK_BONUS_ID,        BONUS_AMOUNT, ORDER_TYPE, ORDER_CODE,        MEMBER_CODE, MEMBER_NAME, REMARK,        `TYPE`,store_code,ORDER_LINE,PERSONAL_AMOUNT,SHARE_POINT_AMOUNT,SHARE_VOUCHER_AMOUNT,PAY_AMOUNT,BILL_TYPE,       VIRTUAL_PRODUCT_CODE,VIRTUAL_PRODUCT_NAME,CARD_TYPE,BRAND_CODE,SALE_STORE       )     values (null, '99999902', '2021-02-09 00:00:00.0',        0.00, null, '2022-03-17 00:37:24.627',        null, '2022-03-17 00:37:24.627', null,        '2021', '02', '09',        null, null, null,        1, 0.01, null,        null, '门店订单', '2729400245965029896',        '', null, null,        '0','9999','2729400247307207176'       ,0.01,0.00       ,0.00,0.01, 1       ,null ,null,null       ,'DD' ,'9999'       ): [kv:1062]Duplicate entry '2729400247307207176-2729400245965029896-1' for key 'ORDER_LINE'"]
[2022/03/17 00:37:25.177 +08:00] [INFO] [schema_validator.go:246] ["the schema validator enqueue, queue is too long"] ["delta max count"=1024] ["remove schema version"=37291]
[2022/03/17 00:37:25.177 +08:00] [INFO] [2pc.go:1251] ["2PC clean up done"] [conn=87425441] [txnStartTS=431868777377300481]
[2022/03/17 00:37:25.224 +08:00] [INFO] [ddl_worker.go:773] ["[ddl] wait latest schema version changed"] [worker="worker 1, tp general"] [ver=38921] ["take time"=53.924591ms] [job="ID:49566, Type:truncate table, State:done, SchemaState:public, SchemaID:39165, TableID:49536, RowCount:0, ArgLen:2, start time: 2022-03-17 00:37:25.12 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2022/03/17 00:37:25.226 +08:00] [INFO] [delete_range.go:346] ["[ddl] insert into delete-range table"] [jobID=49566] [elementID=49536]
[2022/03/17 00:37:25.232 +08:00] [INFO] [delete_range.go:105] ["[ddl] add job into delete-range table"] [jobID=49566] [jobType="truncate table"]
[2022/03/17 00:37:25.234 +08:00] [INFO] [ddl_worker.go:367] ["[ddl] finish DDL job"] [worker="worker 1, tp general"] [job="ID:49566, Type:truncate table, State:synced, SchemaState:public, SchemaID:39165, TableID:49536, RowCount:0, ArgLen:2, start time: 2022-03-17 00:37:25.12 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2022/03/17 00:37:25.242 +08:00] [INFO] [ddl_worker.go:584] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:49568, Type:truncate table, State:none, SchemaState:none, SchemaID:21975, TableID:49534, RowCount:0, ArgLen:0, start time: 2022-03-17 00:37:25.12 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2022/03/17 00:37:25.257 +08:00] [INFO] [domain.go:127] ["diff load InfoSchema success"] [usedSchemaVersion=38921] [neededSchemaVersion=38922] ["start time"=2.137927ms] [tblIDs="[49534,49567]"]
[2022/03/17 00:37:25.258 +08:00] [INFO] [schema_validator.go:246] ["the schema validator enqueue, queue is too long"] ["delta max count"=1024] ["remove schema version"=37292]
[2022/03/17 00:37:25.305 +08:00] [INFO] [ddl_worker.go:773] ["[ddl] wait latest schema version changed"] [worker="worker 1, tp general"] [ver=38922] ["take time"=52.905758ms] [job="ID:49568, Type:truncate table, State:done, SchemaState:public, SchemaID:21975, TableID:49534, RowCount:0, ArgLen:2, start time: 2022-03-17 00:37:25.12 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2022/03/17 00:37:25.308 +08:00] [INFO] [delete_range.go:346] ["[ddl] insert into delete-range table"] [jobID=49568] [elementID=49534]
[2022/03/17 00:37:25.312 +08:00] [INFO] [delete_range.go:105] ["[ddl] add job into delete-range table"] [jobID=49568] [jobType="truncate table"]
[2022/03/17 00:37:25.314 +08:00] [INFO] [ddl_worker.go:367] ["[ddl] finish DDL job"] [worker="worker 1, tp general"] [job="ID:49568, Type:truncate table, State:synced, SchemaState:public, SchemaID:21975, TableID:49534, RowCount:0, ArgLen:2, start time: 2022-03-17 00:37:25.12 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2022/03/17 00:37:25.323 +08:00] [INFO] [ddl.go:509] ["[ddl] DDL job is finished"] [jobID=49568]
[2022/03/17 00:37:25.323 +08:00] [INFO] [domain.go:624] ["performing DDL change, must reload"]
[2022/03/17 00:37:25.324 +08:00] [INFO] [server.go:391] ["connection closed"] [conn=87435420]
[2022/03/17 00:37:25.370 +08:00] [WARN] [session.go:471] ["can not retry txn"] [conn=87427265] [label=general] [error="[kv:1062]Duplicate entry '2729399744124944900-2729399742690492929-1' for key 'ORDER_LINE'"] [IsBatchInsert=false] [IsPessimistic=false] [InRestrictedSQL=false] [tidb_retry_limit=10] [tidb_disable_txn_auto_retry=false]
[2022/03/17 00:37:25.370 +08:00] [WARN] [session.go:487] ["commit failed"] [conn=87427265] ["finished txn"="Txn{state=invalid}"] [error="[kv:1062]Duplicate entry '2729399744124944900-2729399742690492929-1' for key 'ORDER_LINE'"]
[2022/03/17 00:37:25.370 +08:00] [ERROR] [conn.go:730] ["command dispatched failed"] [conn=87427265] [connInfo="id:87427265, addr:10.0.4.48:61667 status:0, collation:utf8_general_ci, user:dev_grid"] [command=Query] [status="inTxn:0, autocommit:0"] [sql=commit] [txn_mode=OPTIMISTIC] [err="previous statement: insert into clerk_bonus_his (ID, CLERK_CODE, BONUS_DATE,        BONUS, CREATED_NAME, CREATED_DATE,        UPDATED_NAME, UPDATED_DATE, IS_EFFECTIVE,        BONUS_YEAR, BONUS_MONTH, BONUS_DAY,        PRODUCT_CODE, COLOR, `SIZE`,        `COUNT`, SALES_AMOUNT, CLERK_BONUS_ID,        BONUS_AMOUNT, ORDER_TYPE, ORDER_CODE,        MEMBER_CODE, MEMBER_NAME, REMARK,        `TYPE`,store_code,ORDER_LINE,PERSONAL_AMOUNT,SHARE_POINT_AMOUNT,SHARE_VOUCHER_AMOUNT,PAY_AMOUNT,BILL_TYPE,       VIRTUAL_PRODUCT_CODE,VIRTUAL_PRODUCT_NAME,CARD_TYPE,BRAND_CODE,SALE_STORE       )     values (null, '99999902', '2021-02-09 00:00:00.0',        0.00, null, '2022-03-17 00:37:24.914',        null, '2022-03-17 00:37:24.914', null,        '2021', '02', '09',        null, null, null,        1, 538.00, null,        null, '门店订单', '2729399742690492929',        '', null, null,        '0','9999','2729399744124944900'       ,0.01,0.00       ,0.00,0.01, 1       ,null ,null,null       ,'DD' ,'9999'       ): [kv:1062]Duplicate entry '2729399744124944900-2729399742690492929-1' for key 'ORDER_LINE'"]
[2022/03/17 00:37:25.374 +08:00] [INFO] [2pc.go:1251] ["2PC clean up done"] [conn=87428296] [txnStartTS=431868777455943682]

tidb-server02: (根据table id 49567 搜索)

[2022/03/17 00:37:25.075 +08:00] [INFO] [2pc.go:1251] ["2PC clean up done"] [conn=88491651] [txnStartTS=431868777377300483]
[2022/03/17 00:37:25.124 +08:00] [INFO] [server.go:388] ["new connection"] [conn=88500664] [remoteAddr=172.16.10.18:52170]
[2022/03/17 00:37:25.127 +08:00] [INFO] [set.go:207] ["set session var"] [conn=88500664] [name=character_set_results] [val=NULL]
[2022/03/17 00:37:25.130 +08:00] [INFO] [session.go:2130] ["CRUCIAL OPERATION"] [conn=88500664] [schemaVersion=38920] [cur_db=pre_report] [sql="truncate table dim_str_info"] [[email protected]]
[2022/03/17 00:37:25.140 +08:00] [INFO] [ddl_worker.go:261] ["[ddl] add DDL jobs"] ["batch count"=1] [jobs="ID:49566, Type:truncate table, State:none, SchemaState:none, SchemaID:39165, TableID:49536, RowCount:0, ArgLen:1, start time: 2022-03-17 00:37:25.12 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0; "]
[2022/03/17 00:37:25.140 +08:00] [INFO] [ddl.go:477] ["[ddl] start DDL job"] [job="ID:49566, Type:truncate table, State:none, SchemaState:none, SchemaID:39165, TableID:49536, RowCount:0, ArgLen:1, start time: 2022-03-17 00:37:25.12 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"] [query="truncate table dim_str_info"]
[2022/03/17 00:37:25.149 +08:00] [INFO] [server.go:391] ["connection closed"] [conn=88500228]
[2022/03/17 00:37:25.161 +08:00] [INFO] [server.go:391] ["connection closed"] [conn=88500232]
[2022/03/17 00:37:25.175 +08:00] [INFO] [domain.go:127] ["diff load InfoSchema success"] [usedSchemaVersion=38920] [neededSchemaVersion=38921] ["start time"=2.308932ms] [tblIDs="[49536,49565]"]
[2022/03/17 00:37:25.176 +08:00] [INFO] [schema_validator.go:246] ["the schema validator enqueue, queue is too long"] ["delta max count"=1024] ["remove schema version"=37291]
[2022/03/17 00:37:25.245 +08:00] [WARN] [session.go:471] ["can not retry txn"] [conn=88491651] [label=general] [error="[kv:1062]Duplicate entry '2729399744124944900-2729399742690492929-1' for key 'ORDER_LINE'"] [IsBatchInsert=false] [IsPessimistic=false] [InRestrictedSQL=false] [tidb_retry_limit=10] [tidb_disable_txn_auto_retry=false]
[2022/03/17 00:37:25.245 +08:00] [WARN] [session.go:487] ["commit failed"] [conn=88491651] ["finished txn"="Txn{state=invalid}"] [error="[kv:1062]Duplicate entry '2729399744124944900-2729399742690492929-1' for key 'ORDER_LINE'"]
[2022/03/17 00:37:25.245 +08:00] [ERROR] [conn.go:730] ["command dispatched failed"] [conn=88491651] [connInfo="id:88491651, addr:10.0.4.48:61461 status:0, collation:utf8_general_ci, user:dev_grid"] [command=Query] [status="inTxn:0, autocommit:0"] [sql=commit] [txn_mode=OPTIMISTIC] [err="previous statement: insert into clerk_bonus_his (ID, CLERK_CODE, BONUS_DATE,        BONUS, CREATED_NAME, CREATED_DATE,        UPDATED_NAME, UPDATED_DATE, IS_EFFECTIVE,        BONUS_YEAR, BONUS_MONTH, BONUS_DAY,        PRODUCT_CODE, COLOR, `SIZE`,        `COUNT`, SALES_AMOUNT, CLERK_BONUS_ID,        BONUS_AMOUNT, ORDER_TYPE, ORDER_CODE,        MEMBER_CODE, MEMBER_NAME, REMARK,        `TYPE`,store_code,ORDER_LINE,PERSONAL_AMOUNT,SHARE_POINT_AMOUNT,SHARE_VOUCHER_AMOUNT,PAY_AMOUNT,BILL_TYPE,       VIRTUAL_PRODUCT_CODE,VIRTUAL_PRODUCT_NAME,CARD_TYPE,BRAND_CODE,SALE_STORE       )     values (null, '99999902', '2021-02-09 00:00:00.0',        0.00, null, '2022-03-17 00:37:24.815',        null, '2022-03-17 00:37:24.815', null,        '2021', '02', '09',        null, null, null,        1, 538.00, null,        null, '门店订单', '2729399742690492929',        '', null, null,        '0','9999','2729399744124944900'       ,0.01,0.00       ,0.00,0.01, 1       ,null ,null,null       ,'DD' ,'9999'       ): [kv:1062]Duplicate entry '2729399744124944900-2729399742690492929-1' for key 'ORDER_LINE'"]
[2022/03/17 00:37:25.247 +08:00] [INFO] [2pc.go:1251] ["2PC clean up done"] [conn=88495071] [txnStartTS=431868777416622096]
[2022/03/17 00:37:25.257 +08:00] [INFO] [domain.go:127] ["diff load InfoSchema success"] [usedSchemaVersion=38921] [neededSchemaVersion=38922] ["start time"=2.565882ms] [tblIDs="[49534,49567]"]
[2022/03/17 00:37:25.259 +08:00] [INFO] [schema_validator.go:246] ["the schema validator enqueue, queue is too long"] ["delta max count"=1024] ["remove schema version"=37292]
[2022/03/17 00:37:25.292 +08:00] [INFO] [schema_validator.go:205] ["the related physical table ID is empty"] [schemaVer=38920] [latestSchemaVer=38922]
[2022/03/17 00:37:25.292 +08:00] [WARN] [session.go:471] ["can not retry txn"] [conn=88495821] [label=general] [error="[domain:8028]Information schema is changed during the execution of the statement(for example, table definition may be updated by other DDL ran in parallel). If you see this error often, try increasing `tidb_max_delta_schema_count`. [try again later]"] [IsBatchInsert=false] [IsPessimistic=true] [InRestrictedSQL=false] [tidb_retry_limit=10] [tidb_disable_txn_auto_retry=false]
[2022/03/17 00:37:25.292 +08:00] [WARN] [session.go:487] ["commit failed"] [conn=88495821] ["finished txn"="Txn{state=invalid}"] [error="[domain:8028]Information schema is changed during the execution of the statement(for example, table definition may be updated by other DDL ran in parallel). If you see this error often, try increasing `tidb_max_delta_schema_count`. [try again later]"]
[2022/03/17 00:37:25.292 +08:00] [WARN] [session.go:1041] ["run statement failed"] [conn=88495821] [schemaVersion=38920] [error="previous statement: select * from xxl_job_lock where lock_name = 'schedule_lock' for update: [domain:8028]Information schema is changed during the execution of the statement(for example, table definition may be updated by other DDL ran in parallel). If you see this error often, try increasing `tidb_max_delta_schema_count`. [try again later]"] [session="{\n  \"currDBName\": \"uat_xxljob\",\n  \"id\": 88495821,\n  \"status\": 0,\n  \"strictMode\": true,\n  \"user\": {\n    \"Username\": \"uat_job\",\n    \"Hostname\": \"172.16.13.242\",\n    \"CurrentUser\": false,\n    \"AuthUsername\": \"uat_job\",\n    \"AuthHostname\": \"172.16.13.%\"\n  }\n}"]

【两个tidb-server都已开启binlog】

image

【drainer配置】

# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
# All configuration items you want to change can be added to:
# server_configs:
#   drainer:
#     aa.b1.c3: value
#     aa.b2.c4: value
[syncer]
db-type = "kafka"
ignore-schemas = "INFORMATION_SCHEMA,METRICS_SCHEMA,PERFORMANCE_SCHEMA,aaa,aliprod119,bak_marketing_0925,dusto_job,goinception,mysql,stress_testing,test,test33,test55,thisaliprod,thisisuat,tidb_loader,txmanager,uat_databank,uat_goods,uat_grid,uat_info,uat_integration,uat_job,uat_marketing,uat_member,uat_message,uat_nacos,uat_order,uat_pay,uat_product,uat_report,uat_saga,uat_security,uat_stock,uat_support,uat_zipkin,pre_databank,pre_info,pre_job,pre_message,pre_nacos,pre_nacos113,pre_product,pre_report,pre_security,pre_support,dev_member,dev_databank,dev_message,dev_info,dev_order,dev_pay,dev_marketing,dev_report,dev_stock,dev_grid,dev_support,dev_security,dev_goods"

[[syncer.ignore-table]]
db-name = "test"
tbl-name = "test"

[[syncer.ignore-table]]
db-name = "pre_member"
tbl-name = "member_info_temp_20190925"
[syncer.to]
kafka-addrs = "kafka01:9092,kafka02:9092,kafka03:9092"
kafka-max-message-size = 1610612736
kafka-max-messages = 1536
kafka-version = "1.0.2"
topic-name = "tidb-binlog-pre"

【已确定目前报错的table id,能查到的都是在过滤条件中的】

image

image

有些表根据id已经查不到了,所以我无法证实

现在我开始担心后面会不会出现 ignore-error 触发 critical error的情况,按照教程中说法要重新同步整个数据,我们的数据量巨大,这是难以承受的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant