Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster delete options #78680

Closed
wzrdtales opened this issue Mar 28, 2022 · 18 comments
Closed

Faster delete options #78680

wzrdtales opened this issue Mar 28, 2022 · 18 comments
Assignees
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community

Comments

@wzrdtales
Copy link

wzrdtales commented Mar 28, 2022

Is your feature request related to a problem? Please describe.

Right now the speed of inserting is way faster than deleting rows. We have a table with 12billion+ rows and we want to start deleting everything before date XYZ. However, doing this would take forever, probably it would be faster to create a new table with only the wanted entries, than actually deleting.

Describe the solution you'd like
Support for large delete statements, if necessary with a complete block for the time being of the whole table.

Describe alternatives you've considered

Looping like the docs say, but it takes ages... .

Jira issue: CRDB-14214

@wzrdtales wzrdtales added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 28, 2022
@blathers-crl
Copy link

blathers-crl bot commented Mar 28, 2022

Hello, I am Blathers. I am here to help you get the issue triaged.

I was unable to automatically find someone to ping.

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner labels Mar 28, 2022
@ajwerner
Copy link
Contributor

If you use hash-sharded indexes, you can

  1. support time-ordered inserts in a more scalable way (which might make it harder to keep up), and\
  2. parallelize the deletion of data in each shard.

@wzrdtales
Copy link
Author

well I guess that wont help with already existing data?

When you say time-ordered inserts, I would assume that it would be possible to delete those quite fast since it would be able to efficiently find everything before a certain time?

@wzrdtales
Copy link
Author

and the biggest problem is probably the locking and retrial of transactions. Issuing fast delete queries by selecting their ids first got us deleting only around 20.000 records per ~6 seconds. As far as I know there is also no way to execute big operations right? With big I mean, delete once 1 million records, and accept blocking the whole table for ~10-40 seconds.

@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Mar 29, 2022
@rafiss rafiss added A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. and removed X-blathers-untriaged blathers was unable to find an owner T-sql-queries SQL Queries Team labels Mar 29, 2022
@wzrdtales
Copy link
Author

So we ended up renaming the old table, recreating a new one and inserting over the stuff we didn't wanted to delete.
This is certainly something cockroachdb has to improve, performance on deletes. It shouldn't need to take that long.

Take this thought for example:
Why is it possible to quickly erase with truncate a whole table, but not an indexed whole subset of data? So for good archival strategies that include deleting data, currently the only way to do it sustainable is having one table per deletion threshold. That is terrible of course... .

@wzrdtales
Copy link
Author

In the end, deletion shouldn't be slower than inserting. I can insert at a rate of way over 100k rows per second, but can't even delete 10k in one.

@michae2
Copy link
Collaborator

michae2 commented Apr 6, 2022

@wzrdtales thanks for the feedback. Sorry you ran into this, glad you were able to figure out a workaround. We do have some plans to speed up deletes, for example:

Regarding your last point, unfortunately I think it is unlikely that SQL DELETE will ever be as fast as SQL INSERT. The implementation of DELETE in cockroach is much more similar to UPDATE than to INSERT. This is not unique to cockroach. Other relational databases get around this problem by supporting DROP PARTITION as a faster way to remove data in bulk. For example, just from a quick search:

We have an open issue about supporting DROP PARTITION. Would this have helped in your situation?

@wzrdtales
Copy link
Author

wzrdtales commented Apr 6, 2022

there is slight difference with the databases you named. They could finish deletes earlier than cockroachdb, b/c they in doubt can allow locking the table in their favour, while cockroachdb will just stubbornly restart a transaction even without a lot of traffic on the table just b/c it takes too long for it.

I will have a look at drop partition and let you know my opinion

@ajwerner
Copy link
Contributor

ajwerner commented Apr 8, 2022

The locking you're describing is tracked in #50181

@wzrdtales
Copy link
Author

@ajwerner not quite, that is about row locking and unfortunately open for a long time.

@wzrdtales
Copy link
Author

@michae2 DROP PARTITION would help in this scenario if we could put a complete week without having too bad performance impact within one partition we could always delete the last week once per week. in 1 week we usually talk about roughly 300 million rows (only that one table, we have a few more of course...)

@michae2
Copy link
Collaborator

michae2 commented Apr 12, 2022

@vy-ton and @kevin-v-ngo tagging you to note that this is another painful deletes issue. DROP PARTITION might have helped in this case.

@wzrdtales I'm going to close this issue in favor of tracking DROP PARTITION support in #51295 and improved delete performance in the other issues linked above.

@michae2 michae2 closed this as completed Apr 12, 2022
@wzrdtales
Copy link
Author

wzrdtales commented Apr 13, 2022 via email

@ajwerner
Copy link
Contributor

Can you help unpack what you meant by:

@ajwerner not quite, that is about row locking and unfortunately open for a long time.

Are you suggesting that a full-table lock would be better than row locking for the relevant rows? Is the idea that row-locking doesn't capture gaps and those gaps may contain new writes and thus may invalidate the original scan?

@wzrdtales
Copy link
Author

wzrdtales commented Apr 13, 2022

other dbs allow you explicitly lock the whole table (a sacrifice the user makes on intention). cockroachdb does not even allow remotely longer running delete queries b/c of locking and forcing them to retry and that is the main issue I see why it takes so long to delete. B/c cockroachdb certainly is capable of searching the rows to delete faster and would certainly be faster in getting them ready for delete when it can just act alone on that table in the meantime without needing to care about locking at all.

Drop partition is more elegant of course since it would avoid that completely for some scenarios. But we already have one datastructure where building up proper partitions would be quite a strange setup, since we would delete only all the hourly records, except the last one of a day (the day might be even updated multiple weeks later and get a new entry) after they are older than two years.

@ajwerner
Copy link
Contributor

Are you aware of the experimental TTL feature we're launching in 22.1? Would that be of interest to you?

@wzrdtales
Copy link
Author

no not aware yet

@michae2
Copy link
Collaborator

michae2 commented Apr 13, 2022

In addition to #50181 I think #55896 and #52768 would also help long-running deletes avoid retries. (Is that right, @ajwerner?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community
Projects
Archived in project
Development

No branches or pull requests

5 participants