-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsender: fix ResumeSpan for Gets #75475
Conversation
Note, in versions before 21.2, we were never issuing Gets when scanning tables, so this was not causing any problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rharding6373)
TFTR! I will spend a bit of time seeing if I can make the SQL-level repro fast enough for a logictest. |
a913c71
to
4fab977
Compare
Added a logic test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
4fab977
to
46fa17b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rharding6373)
pkg/sql/logictest/testdata/logic_test/distsql_agg, line 673 at r2 (raw file):
SELECT every(col2) FROM table58683_1 JOIN table58683_2 ON col1 = (table58683_2.tableoid)::INT8 GROUP BY col2 HAVING bool_and(col2); # Regression test for #745736 - missing Get results when:
nit: I'm not sure when we'll have issue numbers in 700k range :)
pkg/sql/logictest/testdata/logic_test/distsql_agg, line 681 at r2 (raw file):
ALTER TABLE table74736 SPLIT AT VALUES (1000000); ALTER TABLE table74736 EXPERIMENTAL_RELOCATE VALUES (ARRAY[1], 0), (ARRAY[2], 1000000); INSERT INTO table74736 SELECT x * 10000, repeat('a', 500000) FROM generate_series(1, 130) AS g(x);
nit: I think we could lower the size of the blob to 200000 to speed up the test. Target bytes limit is 10MiB, so we'll be able to fit 50 out of 90 rows for the query into the first BatchResponse.
46fa17b
to
76f6ad6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rharding6373 and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/distsql_agg, line 673 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I'm not sure when we'll have issue numbers in 700k range :)
Give it a couple of years :)
pkg/sql/logictest/testdata/logic_test/distsql_agg, line 681 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I think we could lower the size of the blob to 200000 to speed up the test. Target bytes limit is 10MiB, so we'll be able to fit 50 out of 90 rows for the query into the first BatchResponse.
Done. Confirmed it still repros. It was under 1s anyway.
The distsender is not fully implementing the ResumeSpan contract for Get; the promise is that a nil span is set when the operation has successfully completed. A Get that reaches a kvserver but that is not executed has the ResumeSpan set on the server side. But if the requests span multiple ranges, a subset of the requests will not make it to any kvserver. Any Gets that weren't executed and don't have a ResumeSpan will not be executed again (it looks to the upper layer as if the Get was executed and found no key). The end result is that scans can miss rows. This change fills in the missing case and improves the TestMultiRangeScanWithPagination test to allow running mixed operations. Fixes cockroachdb#74736. Release note (bug fix): In particular cases, some queries that involve a scan which returns many results and which includes lookups for individual keys were not returning all results from the table.
76f6ad6
to
bc98764
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)
bors r+ |
This PR was included in a batch that was canceled, it will be automatically retried |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from bc98764 to blathers/backport-release-21.2-75475: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
The distsender is not fully implementing the ResumeSpan contract for
Get; the promise is that a nil span is set when the operation has
successfully completed.
A Get that reaches a kvserver but that is not executed has the
ResumeSpan set on the server side. But if the requests span multiple
ranges, a subset of the requests will not make it to any kvserver.
This change fills in the missing case and improves the
TestMultiRangeScanWithPagination test to allow running mixed
operations.
Fixes #74736.
Release note (bug fix): In particular cases, some queries that involve
a scan which returns many results and which includes lookups for
individual keys were not returning all results from the table.