Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/kv: don't step external read snapshot on volatile UDF execution #104847

Closed
nvanbenschoten opened this issue Jun 14, 2023 · 6 comments · Fixed by #108735
Closed

sql/kv: don't step external read snapshot on volatile UDF execution #104847

nvanbenschoten opened this issue Jun 14, 2023 · 6 comments · Fixed by #108735
Assignees
Labels
A-read-committed Related to the introduction of Read Committed A-sql-routine UDFs and Stored Procedures branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-sql-queries SQL Queries Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Jun 14, 2023

In #100133 / #104362, we added support for per-statement read snapshots under Read Committed by updating the Txn.Step API to advance the transaction's read timestamp. This is correct for most callers of the API, but it appears that the following caller only wants to advance the txn's visibility of its own writes, not all other writes:

// Place a sequence point before each statement in the routine for
// volatile functions.
if g.expr.EnableStepping {
if err := txn.Step(ctx); err != nil {
return err
}
}

This might motivate an expansion of the Step API. For example, we may want it to take a flag that indicates whether only the txn's internal read seq should be advanced or whether its internal read seq and its external read timestamp should be advanced (assuming Read Committed).

Jira issue: CRDB-28746

@nvanbenschoten nvanbenschoten added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team A-sql-routine UDFs and Stored Procedures A-read-committed Related to the introduction of Read Committed labels Jun 14, 2023
@mgartner
Copy link
Collaborator

This is correct for most callers of the API, but it appears that the following caller only wants to advance the txn's visibility of its own writes, not all other writes:

I don't believe this is true, unless I'm misunderstanding what you mean. Future invocations of the UDF (or other UDFs) need to see all the writes of previous invocations of the UDF. Here's an example in Postgres:

CREATE TABLE filter_counter (
  i INT
);
INSERT INTO filter_counter VALUES (0);

CREATE FUNCTION incr_filter_counter() RETURNS BOOL LANGUAGE SQL AS $$
  UPDATE filter_counter SET i = i + 1;
  SELECT true;
$$;

SELECT * FROM generate_series(1, 5) WHERE incr_filter_counter();

SELECT * FROM filter_counter;
--  i
-- ---
--  5
-- (1 row

If each invocation of incr_filter_counter did not see previous writes, the final value of filter_counter.i would be 1, not 5.

@michae2
Copy link
Collaborator

michae2 commented Jun 20, 2023

I don't believe this is true, unless I'm misunderstanding what you mean. Future invocations of the UDF (or other UDFs) need to see all the writes of previous invocations of the UDF. Here's an example in Postgres:

I think future invocations of the UDF would still see all the writes of previous invocations from the same txn. (This issue is about the UDF seeing writes from other txns.)

@mgartner
Copy link
Collaborator

That should only be an issue under Read-Committed isolation, correct?

@nvanbenschoten
Copy link
Member Author

nvanbenschoten commented Jun 21, 2023

Right, when invoking volatile UDFs, all transactions will Step and see prior writes from the same transaction. So the example from #104847 (comment) will work in all isolation levels because the UPDATEs are all part of the same transaction.

The question here is whether Read Committed txns (and only RC txns) should also step their external read snapshot when invoking volatile UDFs, so that each UDF gets an updated view of all other transactions in the system each time it is run. My understanding is that this is how PG works (relevant source code), but to do so they then play a subtle dance of snapshot management to restore the prior read snapshot when back in the calling statement's context.

There are a few reasons why we might want to diverge from this behavior:

  • it's unclear whether applications actually need this behavior
  • it's odd that the behavior of volatile UDFs would differ depending on the isolation level. If someone is actually relying on this external visibility, then their UDF will only work under RC and would be entirely broken under stronger isolation levels
  • it would be quite difficult to support this behavior. We won't easily be able to restore a prior snapshot once the transaction's external read snapshot has been advanced. This would require a lot of infrastructure that we don't currently have
  • unlike PG, CRDB is going to provide a true consistent read snapshot across an entire statement. In such a world, does it make sense for UDFs to break this snapshot? In other words, is a divergence here just a continuation of our primary deviation from PG, or something new?

Answering the first question is probably the place to start. Are we aware of any UDFs that rely on seeing newly committed rows from other transactions ("new" relative to the start of their current statement)? The only example I've been able to come up with that might make some sense is a "succeeds soon" condition where a statement hangs in a backoff loop, repeatedly executing a volatile UDF until it observes some other txn commit.

@mgartner mgartner moved this to 23.2 Release in SQL Queries Jul 24, 2023
@blathers-crl
Copy link

blathers-crl bot commented Oct 17, 2023

Hi @michae2, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@michae2 michae2 added the branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 label Oct 17, 2023
@michae2
Copy link
Collaborator

michae2 commented Nov 3, 2023

Answering the first question is probably the place to start. Are we aware of any UDFs that rely on seeing newly committed rows from other transactions ("new" relative to the start of their current statement)? The only example I've been able to come up with that might make some sense is a "succeeds soon" condition where a statement hangs in a backoff loop, repeatedly executing a volatile UDF until it observes some other txn commit.

I think this is exactly the type of use we will (eventually) find in the wild. Some volatile UDF being called in a PLpgSQL loop waiting for another transaction to commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-read-committed Related to the introduction of Read Committed A-sql-routine UDFs and Stored Procedures branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants