-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: commitPrepStmtNamespace is allocation heavy and almost never needed #32427
Comments
@jordanlewis @petermattis which team / project is this landing in? |
That code is in no-man's land. I think the SQL execution team might have more bandwidth for this, though. @jordanlewis feel free to push back if you disagree. |
We can take this on eventually, though it doesn't seem particularly high priority, and I wouldn't make a claim about having any extra bandwidth. |
There's a 6.4% speedup sitting here for sysbench's |
Fixes cockroachdb#32427. This change reworks the handling of `prepStmtsNamespace` and `prepStmtsNamespaceAtTxnRewindPos` to avoid cloning the former indiscriminately after every statement. This clone was very expensive because it resulted in a series of map allocations and memory copies. The new approach avoids the cloning in almost all cases by performing the copy only when it is needed. In the past, this would translate to cases where `prepStmtsNamespace` and `prepStmtsNamespaceAtTxnRewindPos` actually differed. It does so by introducing a new `checkpointedPrepStmtNamespace` type which abstracts away the details of taking snapshots of the namespace and rewinding when necessary. The change also adds a fast-path that I've wanted for a while which allows us to avoid map accesses entirely for the unnamed prepared statement/portal. A large number of drivers only use the unnamed portal by default, so this should provide a nice speedup. Release note: None
I took a shot at this in this branch: @andreimatei this got me thinking about whether we have the wrong structure here. We need to be able to roll back prepared statements during a txn retry, but do we need to be able to roll back portals? Will clients ever rely on the result of a previously prepared statement binding instead of just rebinding every time they want to use a prepared statement? If not then we could constrain the namespace checkpointing to just prepared statements and fix this easily. If we do need to be able to roll back portals then we'll need to rework the |
I don't know; I don't know what the lifetimes of both portals or prepared statements is, and the docs unfortunately are not very clear. |
From https://www.postgresql.org/docs/10/protocol-flow.html:
So it looks like we're already in violation of this. It hasn't seemed to cause any issues though, but I'm sure it would if clients were expecting portals to be cleaned up after txns, which could lead to an unbounded number of portals. This gives us some flexibility though, as it implies that portals don't need to persist through txn retries. |
We have marked this issue as stale because it has been inactive for |
This is still relevant it seems. |
133971: raft: enable store liveness in raft unit tests phase 2 r=iskettaneh a=iskettaneh This commit modifies the following tests to allow them to run in both store liveness enabled/disabled configurations: 1) TestPreVoteMigrationCanCompleteElection 2) TestPreVoteMigrationWithFreeStuckPreCandidate 3) TestAddNodeCheckQuorum 4) TestCandidateResetTermMsgHeartbeat 5) TestCandidateResetTermMsgApp Note that these tests create their own Raft instances instead of relying on the network te create ones. References: #132241 Release note: None 134183: sql: don't copy maps in `prepStmtNamespace.resetTo` if identical r=nvanbenschoten a=nvanbenschoten This commit optimizes prepStmtNamespace.resetTo for the common case under commitPrepStmtNamespace where the prepStmtNamespace maps are identical. In these cases, deleting and re-inserting into the hash maps can cause hash map resizing and incur heap allocations. Longer term, we should reconsider the relationship between `prepStmtsNamespace` and `prepStmtsNamespaceAtTxnRewindPos`, ideally using a scheme that allows for a constant-time "reset" operation in the common case. This could be accomplished using a copy-on-write scheme or by maintaining some extra state about how the two have diverged since the last reset. For now, we get 90% of the way there by testing for equality before copying maps. ``` name old time/op new time/op delta Sysbench/SQL/oltp_read_only 36.2ms ± 3% 36.7ms ± 3% ~ (p=0.095 n=10+9) Sysbench/SQL/oltp_write_only 19.0ms ± 2% 19.1ms ± 1% ~ (p=0.297 n=9+9) Sysbench/SQL/oltp_read_write 60.1ms ± 6% 58.2ms ± 4% -3.15% (p=0.004 n=9+10) Sysbench/SQL/oltp_point_select 1.61ms ± 2% 1.57ms ± 2% -2.24% (p=0.001 n=9+9) Sysbench/SQL/oltp_begin_commit 587µs ± 2% 543µs ± 2% -7.58% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/SQL/oltp_read_only 1.05MB ± 0% 1.05MB ± 0% ~ (p=0.730 n=9+9) Sysbench/SQL/oltp_write_only 403kB ± 1% 398kB ± 1% -1.17% (p=0.001 n=10+9) Sysbench/SQL/oltp_read_write 1.51MB ± 1% 1.49MB ± 0% -1.20% (p=0.000 n=10+8) Sysbench/SQL/oltp_point_select 35.0kB ± 0% 31.7kB ± 0% -9.28% (p=0.000 n=10+10) Sysbench/SQL/oltp_begin_commit 18.8kB ± 0% 14.8kB ± 0% -21.29% (p=0.000 n=9+9) name old allocs/op new allocs/op delta Sysbench/SQL/oltp_read_only 6.94k ± 0% 6.95k ± 1% ~ (p=0.348 n=9+9) Sysbench/SQL/oltp_write_only 3.62k ± 1% 3.61k ± 0% ~ (p=0.067 n=10+9) Sysbench/SQL/oltp_read_write 11.0k ± 1% 10.9k ± 1% -0.91% (p=0.000 n=9+9) Sysbench/SQL/oltp_point_select 317 ± 0% 309 ± 0% -2.37% (p=0.000 n=10+10) Sysbench/SQL/oltp_begin_commit 139 ± 0% 130 ± 0% -6.74% (p=0.000 n=10+10) ``` Informs: #32427. Epic: None Release note: None Co-authored-by: Ibrahim Kettaneh <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Closing since this is an old issue, and it seems that #134183 addressed the current known improvement. Since performance optimization is an active area of effort, this will resurface through future testing if needed. |
Above is a screenshot of a CPU profile from a
kv95
run on a single-nodem5d.12xlarge
cluster. We can see thatcommitPrepStmtNamespace
is responsible for 2.48% of CPU utilization. This is pretty insane considering there's nothing in the prepared stmt namespace that needs rolling back.We should page this
prepStmtNamespace
stuff back in (including its interaction with the pgwire protocol) and then audit all uses of it. We should then make it cheaper to use, introduce fast-paths that avoid touching it at all when not necessary, or do some combination of both.cc. @knz @andreimatei
Jira issue: CRDB-4747
Epic CRDB-42584
The text was updated successfully, but these errors were encountered: