-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long lived Iterator pinning flushed MemTable and compacted SSTs #10536
Comments
I agree it would be great to solve this in RocksDB. While it isn't for me to decide, I am not sure sure that will happen given the complexity. For all of the MVCC engines that I understand a long-lived transaction isn't free. It blocks vacuum in Postgres, leading to much space-amp. It blocks purge in InnoDB, leading to much space-amp. And here you describe the cost of it for RocksDB. The workaround (or solution) for Postgres and InnoDB is to kill long open transactions when they become a problem. Since this issue was inspired by a MyRocks problem, then the solution works there as well. But that is only a solution for long open transactions created by users, and in your case the problem is a long-open iterator used during index creation. So my solution implies you can only avoid the problem in MyRocks if you don't create secondary indexes on existing large tables, or use OSC to do that. Unfortunately that isn't a great workaround. I appreciate that you filed this and provided a PR for MyRocks to avoid this during create index. But again, I am not sure that RocksDB will fix this. |
Is this related to #10487? It seems doing an iterator refresh with snapshot could free the pinned resources. |
Yes, this issue has some relation to #10487, but #10487 needs to change user API. This issue should be solved without user API change, BTW: if #10487 is resolved, this issue can be solved based on it. |
Summary: This PR resolves #10487 & #10536, user code needs to call Refresh() periodically. The main code change is to support range deletions. A range tombstone iterator uses a sequence number as upper bound to decide which range tombstones are effective. During Iterator refresh, this sequence number upper bound needs to be updated for all range tombstone iterators under DBIter and LevelIterator. LevelIterator may create new table iterators and range tombstone iterator during scanning, so it needs to be aware of iterator refresh. The code path that propagates this change is `db_iter_->set_sequence(read_seq) -> MergingIterator::SetRangeDelReadSeqno() -> TruncatedRangeDelIterator::SetRangeDelReadSeqno() and LevelIterator::SetRangeDelReadSeqno()`. This change also fixes an issue where range tombstone iterators created by LevelIterator may access ReadOptions::snapshot, even though we do not explicitly require users to keep a snapshot alive after creating an Iterator. Pull Request resolved: #10594 Test Plan: * New unit tests. * Add Iterator::Refresh(snapshot) to stress test. Note that this change only adds tests for refreshing to the same snapshot since this is the main target use case. TODO in a following PR: * Stress test Iterator::Refresh() to different snapshots or no snapshot. Reviewed By: ajkr Differential Revision: D48456896 Pulled By: cbi42 fbshipit-source-id: 2e642c04e91235cc9542ef4cd37b3c20823bd779
Expected behavior
Flushed MemTables and compacted SSTs should be deleted.
Actual behavior
Long lived Iterators referring
Version
objects which referring MemTables and compacted SSTs which lead to memory over consumption while writing to DB.Steps to reproduce the behavior
Create an iterator and using it while writing to DB, newly created MemTables can not be freed.
We noticed this issue when we running sysbench on myrocks, we also have filed an issue and PR for MyRocks, we worked around this issue by creating a new Iterator and delete old Iterator periodically. It is better to resolve this issue in RocksDB.
The text was updated successfully, but these errors were encountered: