Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix copy % count inaccuracy by changing the heuristics #600

Closed
ggunson opened this issue May 28, 2018 · 1 comment
Closed

Fix copy % count inaccuracy by changing the heuristics #600

ggunson opened this issue May 28, 2018 · 1 comment
Assignees
Labels

Comments

@ggunson
Copy link
Contributor

ggunson commented May 28, 2018

gh-ost's statistics, when run on a large, busy table with a lot of inserts, can become inaccurate over time. Internally we've sometimes seen cutovers become available when the migration is reported as 90% complete or less.

Copy: 1045877600/1264045717 82.7%; Applied: 884999578; Backlog: 12/1000; Time: 460h0m0s(total), 459h59m53s(copy); streamer: mysql-bin.012574:687201905; State: migrating; ETA: 95h57m17s

The 1045877600/1264045717 82.7% above is equivalent to count of rows in _gho table/count of rows in original table. Compared to the actual counts, the size of the new _gho table was 11% greater, so the % completed was 93%. (The count of rows in the original table was quite accurate, if not exact).

Currently gh-ost determines the table counts by

  1. Getting the row count at the beginning (whether the exact row count or an estimate),
  2. Parsing the binlogs: +1 for inserts and -1 for deletes.
  3. Getting the rows_affected from the insert into _gho select ... from original_table.

The counts are likely inaccurate because due to the concurrent threads of row copying and binlog parsing/applying, we don't know if a binlog INSERT results in a net 1 row increase (since it becomes a REPLACE, and might cancel out via delete+insert) and we don't know if a DELETE deletes a row (because it hadn't been copied over yet by the copy thread). Likely since gh-ost is erring on the side of too-low counts, @shlomi-noach guesses it's the parsed delete counts that are the bigger cause of the discrepancy.

If we have gh-ost get all the _gho row counts from the rows_affected of the writes to the _gho table, this should result in more accurate statistics. E.g.

if _, err := tx.Exec(buildResult.query, buildResult.args...); err != nil {

@ggunson ggunson added the bug label May 28, 2018
@shlomi-noach shlomi-noach self-assigned this May 28, 2018
ajm188 added a commit to ajm188/gh-ost that referenced this issue May 10, 2020
…istics

Addresses github#600.

When applying a DML event, check the RowsAffected on the `Result`
struct. Since all DML event queries are point queries, this will only
ever be 0 or 1. The applier then takes this value and multiplies by
the `rowsDelta` of the event, resulting in a properly-signed, accurate
row delta to use in the statistics.

If an error occurs here, log it, but do not surface this as an
actual error .. simply assume the DML affected a row and move on. It
will be inaccurate, but this is already the case.
timvaillancourt added a commit that referenced this issue Jul 14, 2021
…istics (#844)

* Check RowsAffected when applying DML events to get more accurate statistics

Addresses #600.

When applying a DML event, check the RowsAffected on the `Result`
struct. Since all DML event queries are point queries, this will only
ever be 0 or 1. The applier then takes this value and multiplies by
the `rowsDelta` of the event, resulting in a properly-signed, accurate
row delta to use in the statistics.

If an error occurs here, log it, but do not surface this as an
actual error .. simply assume the DML affected a row and move on. It
will be inaccurate, but this is already the case.

* Fix import

* update wording to warning log message

Co-authored-by: Tim Vaillancourt <[email protected]>

Co-authored-by: Tim Vaillancourt <[email protected]>
@timvaillancourt
Copy link
Collaborator

This has been implemented in #844. Thanks @ajm188!

dm-2 pushed a commit that referenced this issue Jul 7, 2022
…istics (#844)

* Check RowsAffected when applying DML events to get more accurate statistics

Addresses #600.

When applying a DML event, check the RowsAffected on the `Result`
struct. Since all DML event queries are point queries, this will only
ever be 0 or 1. The applier then takes this value and multiplies by
the `rowsDelta` of the event, resulting in a properly-signed, accurate
row delta to use in the statistics.

If an error occurs here, log it, but do not surface this as an
actual error .. simply assume the DML affected a row and move on. It
will be inaccurate, but this is already the case.

* Fix import

* update wording to warning log message

Co-authored-by: Tim Vaillancourt <[email protected]>
dm-2 pushed a commit that referenced this issue Jul 7, 2022
…istics (#844)

* Check RowsAffected when applying DML events to get more accurate statistics

Addresses #600.

When applying a DML event, check the RowsAffected on the `Result`
struct. Since all DML event queries are point queries, this will only
ever be 0 or 1. The applier then takes this value and multiplies by
the `rowsDelta` of the event, resulting in a properly-signed, accurate
row delta to use in the statistics.

If an error occurs here, log it, but do not surface this as an
actual error .. simply assume the DML affected a row and move on. It
will be inaccurate, but this is already the case.

* Fix import

* update wording to warning log message

Co-authored-by: Tim Vaillancourt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants