Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add support to the InternalExecutor to execute multiple SQL queries #71467

Closed
adityamaru opened this issue Oct 12, 2021 · 12 comments
Closed
Assignees
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Oct 12, 2021

Today, the InternalExecutor exposes several methods such as ExecEx, QueryRowEx that take a single SQL statement and execute it using a connExecutor. A new executor is initialized for every statement and is closed once query execution completes. One consequence of this is that an internal executor cannot maintain extraTxnState across multiple SQL queries, and so we cannot run multiple statements in a txn using an internal executor.

This issue tracks work to expand the internal executor to support multiple queries that can be run in an explicit txn, or in a higher-level SQL transaction. For the most part this would require having a single, long-lived connExectuor that reads multiple statements, pushed by the ie, via the stmtBuf.

A motivation for this change is to prototype #67076 (comment).

Epic CRDB-14492
Jira issue: CRDB-10591

@adityamaru adityamaru added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery labels Oct 12, 2021
@adityamaru
Copy link
Contributor Author

@ajwerner @andreimatei @knz going out on a limb and tagging y'all as folks who know this part of the code better. Are there any obvious blockers that I might have missed? I'll keep digging and fleshing out a strawman.

@miretskiy
Copy link
Contributor

Can the multiple statement execution happen via outside Db.Txn with a loop that executes multiple statements?

@andreimatei
Copy link
Contributor

Consider adding session support in the form of a gosql driver implementation that talks to the InternalExecutor, or without using the InternalExecutor at all, but by making it easy to connect to the local sql node through a specialized net.Conn interface that we'd hand to pgx.

@ajwerner
Copy link
Contributor

@RichardJCai this is timely give the discussions yesterday on #69495.

@knz
Copy link
Contributor

knz commented Oct 12, 2021

having a single, long-lived connExectuor that reads multiple statements

You mean a single per usage point?

We can't have a single isntance of connExecutor for IE needs of a SQL server: there are many goroutines that use the IE concurrently. Each of them needs a different instance.

So what would be the lifecycle of an IE?

@adityamaru
Copy link
Contributor Author

Can the multiple statement execution happen via outside Db.Txn with a loop that executes multiple statements?

I think the main shortcoming is that this approach does not maintain extraTxnState across statements -

extraTxnState struct {
. This is only maintained if we stop passing in a transaction to the ie and instead bind it to the transaction of the connExecutor. Whether this matters for IMPORT is still something I don't have a clear vision of.

@adityamaru
Copy link
Contributor Author

adityamaru commented Oct 12, 2021

You mean a single per usage point?

Yup, that's right I misspoke.

So what would be the lifecycle of an IE?

I'm interested in further discussing @ajwerner's suggestion #71246 (comment) of injecting information to tighter bind the IE to the current state of the parent connExecutor.

It raises the question of whether we want, one IE that initializes a connExecutor with some state, runs multiple statements, and closes the connExecutor once it is done with that instance of the IE. Versus, if we stick with the one connExecutor per statement execution, but with the ability to pass state across multiple IEs. My sense is the former is easier to reason about.

Both the above options do rely on us hanging a factory of the execCfg rather than an instance of an IE that is shared across the system to execute queries.

@andreimatei
Copy link
Contributor

andreimatei commented Oct 12, 2021

I'm interested in further discussing @ajwerner's suggestion #71246 (comment) of injecting information to tighter bind the IE to the current state of the parent connExecutor.

It raises the question of whether we want, one IE that initializes a connExecutor with some state, runs multiple statements, and closes the connExecutor once it is done with that instance of the IE. Versus, if we stick with the one connExecutor per statement execution, but with the ability to pass state across multiple IEs. My sense is the former is easier to reason about.

An important case is when there is no "parent connExecutor". I'm just a random module that want to execute a multi-statement txn against the cluster. It should be E-Z to do that - and today it isn't. For this case, I think the experience we should thrive for is exactly the experience a client application outside of CRDB has: use of the gosql package.

In the cases where there is a parent connExecutor (and thus some internal statements need to be mixed with a parent session), there I think the needs vary case-by-case. Do we really have a need for running multiple statements in this case?

@andreimatei
Copy link
Contributor

Can someone explain what in what way this was done? I don't see something obvious in the InternalExecutor's interface.

@andreimatei andreimatei reopened this Aug 29, 2022
@ZhouXing19
Copy link
Collaborator

ZhouXing19 commented Aug 29, 2022

HI @andreimatei, in #82477 we modified how the conn executor inherits and processes txn-related metadata from the outer caller of the internal executor. With those changes, now an internal executor can be used to run multiple sql queries, but only if they are initialized with the new interfaces we introduced.

Tests to run multiple queries with an Internal Executor in descs.TxnWithExecutor() (one of the new interfaces) are also added here and here.

@rafiss rafiss closed this as completed Aug 29, 2022
@andreimatei
Copy link
Contributor

Some new interfaces were added, but it seems to me that we're far from being able to use the InternalExecutor in a conversational way in general, in the way that one uses a sql connection.
One can't do:

ie.Exec("SET varfoo=bar")
ie.Exec("BEGIN")
ie.Exec("SELECT * FROM foo")
ie.Exec("UPDATE...")
ie.Exec("COMMIT")

Given how reasonable the expectation that using the database from inside it should not be a lot harder than using it from the outside is, I think we should keep this issue open for this goal.

@ZhouXing19
Copy link
Collaborator

we're far from being able to use the InternalExecutor in a conversational way in general

I agree that the current internal executor is not versatile, but I wonder how much we want to expect from an internal executor, how far we can possibly go with it, and is it possible to sheer to other ways (such as using the gosql package) for more flexible functionalities?

Also, though we're still far, maybe it'd be helpful if we can break this issue into smaller ones to track the progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

7 participants