-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: internal executor paradigm is bad for global latency (parallel txn) #60968
Comments
Adding fuel to the fire here. I'm trying to optimize COPY and I don't think I can do a decent job w/o parallel writes. For a well dressed table like tpch.lineitem (8 indexes) I think we'll only achieve meaningful speedups with parallel writes. Parsing strings into datums is ~%5 of the time, massaging datums into kv requests is like ~%10 and %85 is kv (%5 kvclient/%20 kvserver/%50 storage/%10 "other"). So to get meaningful speedups I need to push parallel kv requests. IMPORT gets the job done by slicing and dicing and building SSTs which works at big scales but we want small and medium sized COPYs to be fast and be able to overlap with existing keys. We also don't want COPY go crazy with concurrency and resource consumption for chunks of work that don't justify all the coordination. So current thinking is that a nice middle ground would be to use separate goroutines to write to primary table and each index. This should minimize work spent splitting up kv batches into replica traffic and maximize speedups from overlapping writes to separate replicas. No reason not to exploit same approach for all batched inserts I think. The only alternative is to go back to the ugly days of non-atomic COPY and just parallelize writing chunks of the COPY rows but that's a step backwards and will require an opt in with scary THIS ISNT TRANSACTIONAL warnings that will probably scare off most users. Note that parallel writes achieved through a nested transaction or root/leaf model would be just fine as long as the COPY is atomic. |
Can we make the non-transactional behavior opt-in? For example, with #85573 or some other CRDB-specific option for COPY. |
I think there's a bright future behind deferring the writes. I don't quite know how to do it, but it seems possible in the case of copy to on some level coax the execution to just buffering the writes and running them later. I think it gets complex for fancier executions like fk checks. |
We entertained that notion for 22.2 but dismissed it, I really think its a step in the wrong direction but if parallel local-only writes are too big a lift for the near future we can reconsider. Need to bone up on this deferred write concept... |
I think we need to get the KV team involved in the discussions. |
This discussion is interesting enough as a historical artifact, but this issue isn't helping anybody at this point. |
Is your feature request related to a problem? Please describe.
This ends up being a big issue arguing that the lack of support for parallelism in our
*kv.Txn
is making it very hard for us to write reasonable-latency global features. At one point the*kv.Txn
was believed to be safe, but that was buggy (#17197).Effectively all new stateful features added to cockroach use sql tables and, increasingly, use the sql to interact with that state. The problem is we tend to build this abstractions without thinking about latency implications and latency implications are really freaking hard to think about in the programming model.
One thing we sort of have to help but has not thus far been super reliable is the
ddl_analysis
benchmark suite (#50953). This isn't general and isn't really useful for internal things. It also doesn't deal with round-trips due to replication which system tables almost always experience.This use of SQL is pretty great for many reasons; we get to use sql abstractions and we get to dogfood (to an extent; the internal executor isn't exactly the same thing the client experiences, but close). Then, if you look at all of the above packages, we end up building the wrappers around these sql tables as reasonably clean and well defined abstractions. The problem with these abstractions is they almost always entail executing at least one sql statement synchronously. That means, in the normal course of operation when we need to do something in a loop involving one of these subsystems, we incur at least
N
global round-trips.There are a few different an important points to this:
*kv.Txn
API does not allow parallel writes (and only allows parallel reads if you jump through some hoops)Describe the solution you'd like
I think there's a few different approaches we could take. They range in how radical they are.
So, parallelism is probably our best way out; it certainly is the best fit to the programming model we're used to.
Additional context
Another important point is that these long-running transactions have a tendency to interfere with user queries to introspect this state. Use of global transactions may be the answer here!
There are some approaches to mitigate that like #35712 which might enable some amount of user introspection. That's what we're ultimately doing in a more hacky way in #60953.
Subsystems I have in mind:
Jira issue: CRDB-3094
The text was updated successfully, but these errors were encountered: