Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: stale migration does not update completed_timestamp, leading to uncollected garbage #8499

Closed
shlomi-noach opened this issue Jul 20, 2021 · 0 comments · Fixed by #8500

Comments

@shlomi-noach
Copy link
Contributor

The following migration was found to be stale:

*************************** 1. row ***************************
                 id: 194
     migration_uuid: 7c35182a_e3fb_11eb_a192_4e9c3de84043
           keyspace: redacted
              shard: -
       mysql_schema: redacted
        mysql_table: usage
migration_statement: alter table `redacted` modify column redacted bigint default null
           strategy: gh-ost
            options: 
    added_timestamp: 2021-07-13 16:58:02
requested_timestamp: 0000-00-00 00:00:00
    ready_timestamp: 2021-07-14 03:35:35
  started_timestamp: 2021-07-14 03:35:35
 liveness_timestamp: 2021-07-14 03:36:36
completed_timestamp: NULL
  cleanup_timestamp: NULL
   migration_status: failed
           log_path: redacted:/tmp/online-ddl-7c35182a_e3fb_11eb_a192_4e9c3de84043-168931677
          artifacts: _7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_gho,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_ghc,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_del,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_gho,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_ghc,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210714033535_del,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_gho,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_ghc,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_del,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_gho,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_ghc,_7c35182a_e3fb_11eb_a192_4e9c3de84043_20210713165803_del,
            retries: 1
             tablet: redacted
     tablet_failure: 1
           progress: 0.520303
  migration_context: redacted:ccc3e80c-9797-477e-bd43-c7ee8f6ed02f
         ddl_action: alter
            message: stale migration
        eta_seconds: 11476
        rows_copied: 2065000
         table_rows: 0

Notice, however, that completed_timestamp remains NULL. Because of that, garbage collection on artifacts of this migration never runs (GC only runs 24 hours after completed_timestamp).

We need to:

  1. Update completed_timestamp when analyzing a stale migration, and
  2. (for existing migrations in this state) Update a NULL completed_timestamp where a migration is failed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant