Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

relay didn't get latest "min location from subtasks", and when continue parsing heartbeat logic is wrong #1524

Open
lance6716 opened this issue Mar 23, 2021 · 6 comments
Assignees

Comments

@lance6716
Copy link
Collaborator

lance6716 commented Mar 23, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.

two problems from https://github.com/pingcap/dm/pull/1518/checks?check_run_id=2165916757

1.1 when resuming relay from detecting upstream switching, relay didn't get the chance to run the code of "get min location from subtasks", so it will pull very old binlogs (introduced by #1507)

1.2 when we continue parsing relay log files because of needReParse, we may continue from a location between "GTIDEvent" and "XIDEvent", so replaceWithHeartbeat is wrongly calculated

@lance6716
Copy link
Collaborator Author

lance6716 commented Mar 23, 2021

for 1.1, seems the failure is caused by multiple mysql instance hiding from a same IP/source-id, so there maybe mixed server_uuid/ServerID for a relay unit.

[2021/03/22 12:29:06.581 +00:00] [DEBUG] [relay.go:466] ["receive binlog event with header"] [component="relay log"] [header="{\"Timestamp\":1616416137,\"EventType\":16,\"ServerID\":1,\"EventSize\":31,\"LogPos\":927,\"Flags\":0}"]
[2021/03/22 12:29:06.581 +00:00] [DEBUG] [relay.go:501] ["writing binlog event"] [component="relay log"] [header="{\"Timestamp\":1616416137,\"EventType\":16,\"ServerID\":1,\"EventSize\":31,\"LogPos\":927,\"Flags\":0}"]
[2021/03/22 12:29:06.582 +00:00] [DEBUG] [relay.go:466] ["receive binlog event with header"] [component="relay log"] [header="{\"Timestamp\":1616416137,\"EventType\":33,\"ServerID\":2,\"EventSize\":84,\"LogPos\":1011,\"Flags\":0}"]
[2021/03/22 12:29:06.582 +00:00] [DEBUG] [relay.go:501] ["writing binlog event"] [component="relay log"] [header="{\"Timestamp\":1616416137,\"EventType\":33,\"ServerID\":2,\"EventSize\":84,\"LogPos\":1011,\"Flags\":0}"]

@lance6716
Copy link
Collaborator Author

lance6716 commented Mar 24, 2021

we can't start parsing a WriteRowsEvent without seeing its previous TableMapEvent @GMHDBJD

[2021/03/24 20:04:55.985 +08:00] [DEBUG] [reader.go:548] ["start parse relay log file"] [task=test] [unit="binlog replication"] [component="binlog reader"] [file=/tmp/dm_test/new_relay/worker1/relay_log/09bec856-ba95-11ea-850a-58f2b4af5188.000001/mysql-bin.000001] [offset=1008]
...
[2021/03/24 20:04:55.986 +08:00] [ERROR] [reader.go:557] ["parse relay log file"] [task=test] [unit="binlog replication"] [component="binlog reader"] [file=/tmp/dm_test/new_relay/worker1/relay_log/09bec856-ba95-11ea-850a-58f2b4af5188.000001/mysql-bin.000001] [offset=1008] [error="Header &replication.EventHeader{Timestamp:0x605b2ae7, EventType:0x1e, ServerID:0x1, EventSize:0x35, LogPos:0x4e8, Flags:0x0}, Data \"ԑ\\x00\\x00\\x00\\x00\\x01\\x00\\x02\\x00\\x02\\xff\\xfc\\x03\\x00\\x00\\x00\\fEddard Stark\", Err: table id 37332: invalid table id, no corresponding table map event"] [errorVerbose="Header &replication.EventHeader{Timestamp:0x605b2ae7, EventType:0x1e, ServerID:0x1, EventSize:0x35, LogPos:0x4e8, Flags:0x0}, Data \"ԑ\\x00\\x00\\x00\\x00\\x01\\x00\\x02\\x00\\x02\\xff\\xfc\\x03\\x00\\x00\\x00\\fEddard Stark\", Err: table id 37332: invalid table id, no corresponding table map event\ngithub.com/pingcap/errors.AddStack\n\t/Users/lance/Projects/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/Users/lance/Projects/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/siddontang/go-mysql/replication.(*BinlogParser).parseSingleEvent\n\t/Users/lance/Projects/go/pkg/mod/github.com/lance6716/[email protected]/replication/parser.go:149\ngithub.com/siddontang/go-mysql/replication.(*BinlogParser).ParseReader\n\t/Users/lance/Projects/go/pkg/mod/github.com/lance6716/[email protected]/replication/parser.go:166\ngithub.com/siddontang/go-mysql/replication.(*BinlogParser).ParseFile\n\t/Users/lance/Projects/go/pkg/mod/github.com/lance6716/[email protected]/replication/parser.go:96\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).parseFile\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:552\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).parseFileAsPossible\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:414\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).parseDirAsPossible\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:381\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).parseRelay\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:310\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).StartSyncByGTID.func1\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:283\nruntime.goexit\n\t/usr/local/Cellar/go/1.13.6/libexec/src/runtime/asm_amd64.s:1357"]

will fix tomorrow

@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Mar 24, 2021

Seems we parse from WriteRowsEvent first when reparse. I think 1.0 has the same problem.

@lance6716
Copy link
Collaborator Author

[2021/03/24 20:04:55.986 +08:00] [ERROR] [reader.go:557] ["parse relay log file"] [task=test] [unit="binlog replication"] [component="binlog reader"] [file=/tmp/dm_test/new_relay/worker1/relay_log/09bec856-ba95-11ea-850a-58f2b4af5188.000001/mysql-bin.000001] [offset=1008] [error="Header &replication.EventHeader{Timestamp:0x605b2ae7, EventType:0x1e, ServerID:0x1, EventSize:0x35, LogPos:0x4e8, Flags:0x0}, Data "ԑ\x00\x00\x00\x00\x01\x00\x02\x00\x02\xff\xfc\x03\x00\x00\x00\fEddard Stark", Err: table id 37332: invalid table id, no corresponding table map event"] [errorVerbose="Header &replication.EventHeader{Timestamp:0x605b2ae7, EventType:0x1e, ServerID:0x1, EventSize:0x35, LogPos:0x4e8, Flags:0x0}, Data "ԑ\x00\x00\x00\x00\x01\x00\x02\x00\x02\xff\xfc\x03\x00\x00\x00\fEddard Stark", Err: table id 37332: invalid table id, no corresponding table map event\ngithub.com/pingcap/errors.AddStack\n\t/Users/lance/Projects/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/Users/lance/Projects/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/siddontang/go-mysql/replication.(*BinlogParser).parseSingleEvent\n\t/Users/lance/Projects/go/pkg/mod/github.com/lance6716/[email protected]/replication/parser.go:149\ngithub.com/siddontang/go-mysql/replication.(*BinlogParser).ParseReader\n\t/Users/lance/Projects/go/pkg/mod/github.com/lance6716/[email protected]/replication/parser.go:166\ngithub.com/siddontang/go-mysql/replication.(*BinlogParser).ParseFile\n\t/Users/lance/Projects/go/pkg/mod/github.com/lance6716/[email protected]/replication/parser.go:96\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).parseFile\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:552\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).parseFileAsPossible\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:414\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).parseDirAsPossible\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:381\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).parseRelay\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:310\ngithub.com/pingcap/dm/pkg/streamer.(*BinlogReader).StartSyncByGTID.func1\n\t/Users/lance/Projects/dm/pkg/streamer/reader.go:283\nruntime.goexit\n\t/usr/local/Cellar/go/1.13.6/libexec/src/runtime/asm_amd64.s:1357"]

Seems we parse from WriteRowsEvent first when reparse. I think 1.0 has the same problem.

Now I think the r.parser is stateful, thus could know it has seen TableMapEvent last time it parse. I'll investigate tomorrow

@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Mar 24, 2021

I remember that parser will parse TableMapEvent and WriteEvent together, and may not stateful. good night. 🤣

@lance6716
Copy link
Collaborator Author

I remember that parser will parse TableMapEvent and WriteEvent together, and may not stateful. good night. 🤣

We continue using same r.parser, so RowsEvent which is read after a resuming still has access to r.parser's known table ID.

there's some error filter about errMissingTableMapEvent. Will check if this can support DM-worker restarting on relay.meta after release

https://github.com/siddontang/go-mysql/blob/0c5789dd0bd378b4b84f99b320a2d35a80d8858f/replication/parser.go#L168

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants