Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: speed up DROP COLUMN #35499

Closed
vivekmenezes opened this issue Mar 7, 2019 · 1 comment
Closed

sql: speed up DROP COLUMN #35499

vivekmenezes opened this issue Mar 7, 2019 · 1 comment
Labels
A-schema-changes C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs

Comments

@vivekmenezes
Copy link
Contributor

DROP COLUMN involves the following process:

  1. Move the column into the WRITE_ONLY state
  2. Move the column into the DELETE_ONLY state
  3. scan the entire table and delete the column
  4. remove the column

step 3. can take a long time because it is a full scan as well as an update of every row.

What we can do is change this to

  1. Move the column into the WRITE_ONLY state
  2. Move the column into the DELETE_ONLY state
  3. remove the column while keeping the column id in the DELETE_ONLY state so that all future UPDATE/DELETE on the row will continue to delete the column. Assume this is a transaction at time T. We do this to allow a future deletion of the column to be doable at an older timestamp T.
    Note: the column name is now reusable.
  4. The old column is deleted asynchronously at timestamp T or higher.

When is the old data deleted?

We could immediately delete all the old data but that will double the data held because of MVCC.
So instead we can wait out the data TTL and then delete the column. This will immediately trigger TTL delete on the older version of the data.

Ideally we want to use the AddSSTable bulk ingest mechanism to delete the old column at timestamp T. I don't see any concerns with using it.

@vivekmenezes vivekmenezes added C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs A-schema-changes labels Mar 7, 2019
@thoszhang
Copy link
Contributor

Closing this since I think we're taking the #47989 approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
Projects
None yet
Development

No branches or pull requests

2 participants