load phase of `all` mode may lost data when worker frequently starts/stops task #1377

lance6716 · 2021-01-14T04:17:33Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error.

will reproduce error later.

This bug is caused by two hidden bugs

a) cleanDumpFiles is wrongly triggered because it will check checkPoint.AllFinished(), but this is a memory cache of DB which may not initialized
b) load unit didn't return error when there're missing SQL files which should exists according to checkpoint

when worker is too quickly Close because of scheduling of network problems, worker may trigger a). In all mode, cleanDumpFiles will clean the SQL files (in full mode cleanDumpFiles will clean whole directory including dump metadata thus would cause an error). Then next time when it resume with the task, worker is continued on load unit because IsFresh will look into database rather than cache like checkPoint.AllFinished(). When load unit starts, it will trigger b) so finish load unit because of no files to load and goes to sync unit.
What did you expect to see?
What did you see instead?

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

(paste DM version here, and you must ensure versions of dmctl, DM-worker and DM-master are same)

The text was updated successfully, but these errors were encountered:

glkappe · 2021-01-14T10:43:18Z

asktug link：https://asktug.com/t/topic/67791

lance6716 added the severity/major label Jan 14, 2021

lance6716 changed the title ~~load phase may lost data~~ load phase may lost data when worker is frquently start/stop task Jan 14, 2021

lance6716 changed the title ~~load phase may lost data when worker is frquently start/stop task~~ load phase may lost data when worker frquently starts/stops task Jan 14, 2021

lance6716 changed the title ~~load phase may lost data when worker frquently starts/stops task~~ load phase of full mode may lost data when worker frquently starts/stops task Jan 14, 2021

lance6716 changed the title ~~load phase of full mode may lost data when worker frquently starts/stops task~~ load phase of all mode may lost data when worker frquently starts/stops task Jan 14, 2021

lance6716 changed the title ~~load phase of all mode may lost data when worker frquently starts/stops task~~ load phase of all mode may lost data when worker frequently starts/stops task Jan 14, 2021

lance6716 changed the title ~~load phase of all mode may lost data when worker frequently starts/stops task~~ load phase of all mode may lost data when worker frequently starts/stops task Jan 14, 2021

GMHDBJD mentioned this issue Jan 14, 2021

loader: fix loader lost data when quickly starts/stops task #1378

Merged

jebter added the type/bug This issue is a bug report label Jan 16, 2021

lance6716 mentioned this issue Jan 18, 2021

Support syncing when disk is very slow #1388

Open

5 tasks

lance6716 closed this as completed in #1378 Jan 19, 2021

ti-srebot mentioned this issue Jan 19, 2021

loader: fix loader lost data when quickly starts/stops task (#1378) #1389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load phase of `all` mode may lost data when worker frequently starts/stops task #1377

load phase of `all` mode may lost data when worker frequently starts/stops task #1377

lance6716 commented Jan 14, 2021 •

edited

Loading

glkappe commented Jan 14, 2021

load phase of all mode may lost data when worker frequently starts/stops task #1377

load phase of all mode may lost data when worker frequently starts/stops task #1377

Comments

lance6716 commented Jan 14, 2021 • edited Loading

Bug Report

glkappe commented Jan 14, 2021

load phase of `all` mode may lost data when worker frequently starts/stops task #1377

load phase of `all` mode may lost data when worker frequently starts/stops task #1377

lance6716 commented Jan 14, 2021 •

edited

Loading