Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support general big tx #779

Closed
ghost opened this issue Sep 3, 2021 · 5 comments
Closed

support general big tx #779

ghost opened this issue Sep 3, 2021 · 5 comments
Labels
VERIFIED verified issue
Milestone

Comments

@ghost
Copy link

ghost commented Sep 3, 2021

目标:

  • 支持大事务(例如16G) + dtle主机小内存(如2G)的情况
  • 单行数据(含处理过程中的额外消耗)不超过内存极限

参考用例:

drop table if exists big;
create table if not exists big (id int primary key auto_increment, val longtext);
set @a = repeat('a', 64*1024*1024);

begin;
insert into big values (0, @a);
insert into big values (0, @a);
insert into big values (0, @a);
insert into big values (0, @a);
-- ...
commit;
@ghost
Copy link
Author

ghost commented Sep 7, 2021

方案:

  • extractor (binlog_reader) 遇到大事务时, 按行分割, 发送给目标端.
    • 即一个 BinlogEntry 可以是不完整的TX
  • applier接收到不完整的TX, 以单线程执行
    • 对于不完整的TX, 直到最后一个才进行commit.
  • 大事务可能超越free_mem时, 进行一系列抑制, 延迟从mysql接收后续binlog

潜在的问题:

  • 延迟接收binlog, 可能会触发 mysql net_write_timeout
    • binlog relay能一定程度避免
    • 但relay 本身也有问题

2021-11-24更新:

  • dtle源端发送大事务(分段)后进入等待模式, 直到目标端执行成功并反馈, 才继续获取下一个event
  • 最多等待 @@net_write_timeout / 2 时间, 避免断连

@ghost ghost changed the title big tx (TODO) support general big tx Sep 7, 2021
@ghost ghost added this to the 3.21.09.0 milestone Sep 7, 2021
@ghost
Copy link
Author

ghost commented Sep 18, 2021

上游抑制方案:

dtle src 端发送一份(分割后的)大事务(BinlogEntry)后, 需要等待dtle dst回复以后, 才会继续从源端获取Event

  • 普通BinlogEntry不需要dtle dst回复
  • 等待回复有限时机制, 避免长时间等待导致源端MySQL net_write_timeout
    • 注意忽略超时的回复

其他考量

  • 根据内存情况和任务数量, 考虑限制多个job同时执行大事务

@re-f re-f pinned this issue Sep 23, 2021
ghost pushed a commit that referenced this issue Oct 15, 2021
ghost pushed a commit that referenced this issue Oct 15, 2021
@ghost
Copy link
Author

ghost commented Oct 18, 2021

问题2

在2G RAM主机上(avail 1.6G). 将dtle bigTxSplittingSize 设为128M, 会OOM. 设为64M则可以完成复制. 放大率在(1.6*1025/128=)12.8倍以上. 不够理想.

pprof/heap显示replication库的ReadPacketReuseMem可能是个优化点. 实际优化效果不良.

@ghost
Copy link
Author

ghost commented Nov 8, 2021

问题3

多job时, 处理大事务. 有必要的话, 需要限制: 同时只有一个job能执行大事务.

更新: 见文档 big_tx_max_jobs

@asiroliu
Copy link
Collaborator

asiroliu commented Dec 8, 2021

version:
9.9.9.9-master-6287a99

步骤:

  1. 创建dtle job
  2. 源端插入大事务
drop table if exists big;
create table if not exists big (id int primary key auto_increment, val longtext);
-- 最大为4,超过这显示为NULL
set @a = repeat('a', 4*1024*1024);

begin;
insert into big values (0, @a);
insert into big values (0, @a);
insert into big values (0, @a);
insert into big values (0, @a);
--  512个插入
commit;
  1. 查看源端dtle log
2021-12-08T16:45:29.833+0800 [DEBUG] client.driver_mgr.dtle: splitting big tx: driver=dtle @module=reader index=0 job=issue-migration timestamp=2021-12-08T16:45:29.833+0800
2021-12-08T16:45:29.833+0800 [DEBUG] client.driver_mgr.dtle: sendEntry: driver=dtle @module=reader events=16 gno=8 isBig=true job=issue-migration timestamp=2021-12-08T16:45:29.833+0800
...
2021-12-08T16:45:32.076+0800 [DEBUG] client.driver_mgr.dtle: bigtx_ack: driver=dtle @module=dtle.extractor gno=8 index=0 job=issue-migration timestamp=2021-12-08T16:45:32.057+0800
...
2021-12-08T16:45:32.088+0800 [DEBUG] client.driver_mgr.dtle: splitting big tx: driver=dtle job=issue-migration @module=reader index=1 timestamp=2021-12-08T16:45:32.087+0800
2021-12-08T16:45:32.089+0800 [DEBUG] client.driver_mgr.dtle: sendEntry: driver=dtle @module=reader events=16 gno=8 isBig=true job=issue-migration timestamp=2021-12-08T16:45:32.088+0800
...
2021-12-08T16:45:35.161+0800 [DEBUG] client.driver_mgr.dtle: bigtx_ack: driver=dtle @module=dtle.extractor gno=8 index=1 job=issue-migration timestamp=2021-12-08T16:45:35.094+0800

@asiroliu asiroliu added the VERIFIED verified issue label Dec 8, 2021
@ghost ghost unpinned this issue Dec 19, 2021
@ghost ghost mentioned this issue Jun 29, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VERIFIED verified issue
Projects
None yet
Development

No branches or pull requests

1 participant