Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: index recommendation code can cause log spam and errors #85875

Closed
ajwerner opened this issue Aug 10, 2022 · 6 comments · Fixed by #85907
Closed

sql: index recommendation code can cause log spam and errors #85875

ajwerner opened this issue Aug 10, 2022 · 6 comments · Fixed by #85907
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Aug 10, 2022

Describe the problem

I ran an unrelated test and see:

W220810 04:22:35.670649 5728 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1482  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.683731 8487 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1483  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.697454 8489 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1484  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.709887 8456 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1485  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.722901 8516 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1486  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.736447 8518 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1487  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.748243 8520 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1488  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.759527 8493 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1489  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.772222 8463 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1490  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.783008 8526 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1491  unable to build memo: cannot use CheckPrivilege without a txn
W220810 04:22:35.793800 8528 sql/instrumentation.go:711  [n1,intExec=expire-sessions] 1492  unable to build memo: cannot use CheckPrivilege without a txn

I think there's a few things going on here:

  1. I think the internal executor is in some way not set up for the index recommendation code.
  2. I think we need to have the cache update the last generation time when we're deciding to return true as opposed to when we add the new recommendation -- otherwise if the index recommendation code starts failing, we'll run it every time the query is hit. Already I think there's a problem that if you run a query at a high frequency then once you hit the threshold, we'll generate the recommendations in parallel.

To Reproduce

Run a test that sets short job intervals.

Jira issue: CRDB-18470

@ajwerner ajwerner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 10, 2022
@maryliag maryliag self-assigned this Aug 10, 2022
@RichardJCai
Copy link
Contributor

cc @rafiss

I think this might be related to the case where we get an error but only log it instead of returning it, thus the txn becomes invalid?

@rafiss
Copy link
Collaborator

rafiss commented Aug 10, 2022

I think this might be related to the case where we get an error but only log it instead of returning it, thus the txn becomes invalid?

+1 was going to point that out too. Here's an example where that happened before: #85454

Is there any part of this code that uses a transaction/internalExecutor to do something, but then ignores/logs the error without returning it to the txn runner?

@maryliag
Copy link
Contributor

This is the part of the code that is causing the error

evalCtx := opc.p.EvalContext()
oldCtx := evalCtx.Context
evalCtx.Context = ctx
defer func() {
evalCtx.Context = oldCtx
}()
f.Init(evalCtx, &opc.catalog)
f.FoldingControl().AllowStableFolds()
bld := optbuilder.New(ctx, &opc.p.semaCtx, evalCtx, &opc.catalog, f, opc.p.stmt.AST)
err := bld.Build()
if err != nil {
log.Warningf(ctx, "unable to build memo: %s", err)
} else {
err = opc.makeQueryIndexRecommendation(ctx)
if err != nil {
log.Warningf(ctx, "unable to generate index recommendations: %s", err)
} else {
idxRec.UpdateIndexRecommendations(
ih.fingerprint,
ih.planGist.Hash(),
planner.SessionData().Database,
stmtType,
ih.indexRecommendations,
true,
)
}
}

that is the first time using the executor (on this new function) and we add the log. That functions is called here

planner.instrumentation.SetIndexRecommendations(ctx, ex.server.idxRecommendationsCache, planner)

@rafiss
Copy link
Collaborator

rafiss commented Aug 10, 2022

Is it intentional that a nil txn is passed into ExecEx? I'd guess not, since otherwise why are we using the IE inside of a db.Txn?

		if err := r.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
			// Run the expiration transaction at low priority to ensure that it does
			// not contend with foreground reads. Note that the adoption and cancellation
			// queries also use low priority so they will interact nicely.
			if err := txn.SetUserPriority(roachpb.MinUserPriority); err != nil {
				return errors.WithAssertionFailure(err)
			}
			_, err := r.ex.ExecEx(
				ctx, "expire-sessions", nil,
				sessiondata.InternalExecutorOverride{User: username.RootUserName()},
				removeClaimsForDeadSessionsQuery,
				s.ID().UnsafeBytes(),
				cancellationsUpdateLimitSetting.Get(&r.settings.SV),
			)
			return err
		})

@rafiss
Copy link
Collaborator

rafiss commented Aug 10, 2022

OK I have a better idea now, my previous comment definitely seems like it's a bug (i.e. the change in aed014f doesn't seem like it fully did what we want cc @ajwerner), but I don't think that's the problem here.

The problem is that that SetIndexRecommendations is called here:

planner.instrumentation.SetIndexRecommendations(ctx, ex.server.idxRecommendationsCache, planner)

But at this point the transaction may already have committed, probably due to the 1PC fast path optimization.

@ajwerner
Copy link
Contributor Author

But at this point the transaction may already have committed, probably due to the 1PC fast path optimization.

It also could be aborted due to a serializability or deadlock error.

maryliag added a commit to maryliag/cockroach that referenced this issue Aug 10, 2022
Previously, if a transaction (from planner.txn) was already closed,
due to one-phase commit optimization or an error, we were still
trying to generate a recommendation, causing log spam.
This commit now checks if we have a transaction open that can be
used, otherwise it won't try to generate recommendation.

This commit also updates the cache, even if the recommendation
generations fails, this way we won't keep trying constantly,
and instead will wait for the next hour to try again.

Fixes cockroachdb#85875

Release note (bug fix): Do a proper check if a transaction is
open, before trying to generate index recommendations.
craig bot pushed a commit that referenced this issue Aug 11, 2022
85907: sql: fix log spam during idx recommendation if txn was closed r=maryliag a=maryliag

Previously, if a transaction (from planner.txn) was already closed,
due to one-phase commit optimization or an error, we were still
trying to generate a recommendation, causing log spam.
This commit now checks if we have a transaction open that can be
used, otherwise it won't try to generate recommendation.

This commit also updates the cache, even if the recommendation
generations fails, this way we won't keep trying constantly,
and instead will wait for the next hour to try again.

Fixes #85875

Release note: None

85921: sql: use the correct txn when creating external connection r=benbardin a=adityamaru

We need to use the planners' txn when creating an external
connection. Using a DB().Txn here was an oversight.

Release note: None

Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
@craig craig bot closed this as completed in 477d1e8 Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants