Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: timeout during export request on 16gb range #107519

Open
tbg opened this issue Jul 25, 2023 · 3 comments
Open

kvserver: timeout during export request on 16gb range #107519

tbg opened this issue Jul 25, 2023 · 3 comments
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. db-cy-23 T-disaster-recovery

Comments

@tbg
Copy link
Member

tbg commented Jul 25, 2023

Describe the problem

In #104588 we're seeing a backup fail to back up a 16GiB range. I've learned that ExportRequest reads 16mb worth of values; in this case it was possibly an incremental that didn't find anything new and so had to scan the entire 16GB, which is no bueno - very expensive.

While we don't endorse let alone support 16GiB ranges, it stands to reason that
backup should be able to back up ranges of any size, as sometimes ranges may grow
to that size without the operator being at fault.

Also, we are entertaining the idea of increasing the default range sizes significantly,
which will likely put this issue on the menu at least in some deployments.

So we should find a way to paginate on the "bytes processed" and not "bytes returned".

To Reproduce

Presumably doing what the linked roachtest does to get the large range and then
trying to back up the table will reproduce it.

Related

#103879 is about a similar issue when sending snapshots.

Jira issue: CRDB-30090

@tbg tbg added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery labels Jul 25, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 25, 2023

cc @cockroachdb/disaster-recovery

@dt
Copy link
Member

dt commented Jul 25, 2023

@tbg The 5min timeout is per request, not per range, and regardless of how large the range is, any given request is sent with a 16MB pagination size limit. Why does a 16gb range take longer than a 512mib range to read the same 16mb?

@tbg tbg changed the title backup: timeout prevents backing up tables with large range kvserver: timeout during export request on 16gb range Jul 25, 2023
@tbg
Copy link
Member Author

tbg commented Jul 25, 2023

I updated the issue to say that the pagination should be based on bytes processed, not bytes returned - you are probably right that it's an incremental that doesn't return ~anything and so has to read 16gb. Feel free to retitle, adjust, etc!

@tbg tbg added the db-cy-23 label Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. db-cy-23 T-disaster-recovery
Projects
None yet
Development

No branches or pull requests

2 participants