-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: breakup the txn that inserts stats during cluster restore #75969
Conversation
I can add some checkpointing to make sure we only insert table stats once, but that would involve a proto change. I wanted to get yall temperature on backporting this form of the fix. |
74ed9bd
to
9f5843e
Compare
Hmmm for a second I thought that breaking up the txn means that we could have a PK collision if the job were to be re-resumed. The stats table has a PK on statsID, tableID, but fortunately, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good to me.
We have seen instances of restores with hundreds of tables getting stuck on inserting the backed up table stats into the system.table_stats table on the restoring cluster. Previously, we would issue insert statements for each table stat row in a single, long-running txn. If this txn were to be retried a few times, we would observe intent buildup on the system.table_stats ranges. Once these intents exceeded the `max_intent_bytes` on the cluster, every subsequent txn retry would fall back to the much more expensive ranged intent resolution. The only remedy at this point would be delete the BACKUP-STATISTICS file from the bucket where the backup resides, and restore the tables with no stats, relying on the AUTO STATS job to rebuild them gradually. This change "batches" the insertion of the table stats to prevent the above situation. Fixes: cockroachdb#69207 Release note: None
9f5843e
to
568c463
Compare
TFTR! bors r=stevendanna |
Build succeeded: |
This comment was marked as off-topic.
This comment was marked as off-topic.
blathers backport 21.2 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 568c463 to blathers/backport-release-21.2-75969: POST https://api.github.com/repos/cockroachlabs/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.2 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
We have seen instances of restores with hundreds of tables getting
stuck on inserting the backed up table stats into the system.table_stats
table on the restoring cluster. Previously, we would issue insert
statements for each table stat row in a single, long-running txn. If this
txn were to be retried a few times, we would observe intent buildup
on the system.table_stats ranges. Once these intents exceeded the
max_intent_bytes
on the cluster, every subsequent txn retry would fallback to the much more expensive ranged intent resolution. The only
remedy at this point would be to delete the BACKUP-STATISTICS file from
the bucket where the backup resides, and restore the tables with no
stats, relying on the AUTO STATS job to rebuild them gradually.
This change "batches" the insertion of the table stats to prevent the
above situation.
Fixes: #69207
Release note: None