-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: return non-live, live info per table #83677
Conversation
@ajwerner I create this version with the most straight forward solution. Are there improvements you can see I should be adding here from the start? |
86d6787
to
4c4e5f5
Compare
My fear is that for large tables this is going to now make the request just not work where before it worked. I don't really know what to do about that. |
Anything we can do with the info about total range count from here which is just a little below (and I could move my changes to after that)? |
I have a cute idea which might solve both the problems here and the problems in @yuzefovich how crazy does it sound? It sort of looks like if I had an implementation and hooked it up in the below code, then maybe it'd work? Is there more to know about planning to make the pieces fit together? cockroach/pkg/sql/colexec/builtin_funcs.go Line 122 in 928f605
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow your idea - what exactly would benefit from the vectorization? Or are you thinking that in the vectorized infrastructure we could create a specialized operator for a particular builtin function which internally has parallelism (to issue those range status lookups) unlike in the row-by-row infra? That seems reasonable to me, and I think the code you're pointing at would be the right entry point, and then things should mostly just work.
There would be some minor changes to expose the fact of concurrency in this builtin (if we're using a Txn, we'd have to ask for a LeafTxn, also vectorizedFlowCreator.operatorConcurrency
from vectorized_flow.go
will need to be set to true
- actually doing that will ensure that we'll get a LeafTxn).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
Precisely.
Fortunately |
Hm, no, nothing else comes to mind (as long as the queries using this new builtin don't also include things like CDC / BulkIO - those don't run through the vectorized infra). |
@yuzefovich maybe you can help me out. I typed the code but it doesn't quite work (also I'm sure I'm not doing memory monitoring correctly, or at all). My understanding of why is that we don't actually vectorize the rendering of the expression. What can I do to force it to use the vectorized operator?
leads to the builtin being called as part of evaluation in the |
Maybe this code here is the problem, or at least, the source of inflexibility: cockroach/pkg/sql/distsql_spec_exec_factory.go Lines 168 to 176 in 928f605
|
I think the problem is the usage of cockroach/pkg/sql/distsql_physical_planner.go Line 3083 in 78caa5c
|
(Maybe it's not a "materialized view" but it's something (namely "virtual table" inside of it - |
Hmm, is there a way to force the code to look inside the expressions and inject some intermediate render node or something like that if there are optimized functions? I imagine there's a way to go from rows->columns->render->rows |
No, unfortunately nothing like that exists. The thing is that this |
Oh, actually, I take this, at least partially, back - the problem might be simpler than I initially thought. It does look like the rendering is in |
I typed up #83689 for that idea, but I think it's broken. However, it might be sufficient for your prototype. |
I didn't really follow your draft PR. Maybe we can collaborate on this for a little bit this afternoon if you're willing or some time next week. I've reached the point in the code where my lack of understanding of concepts means I'd need to learn a lot to make progress. |
(Bart delays provide ample time to code lol) |
Indeed:
|
45244fa
to
f86dcc9
Compare
@ajwerner @yuzefovich do you know if the range tables are not available during tests? |
Your code seems sad here: Line 933 in 8b83a21
It seems that the type is wrong the column it claims. Maybe you need to name the projection with the cast? I'm not sure. |
Before this change, we'd just log the error body, which often is not very helpful. The format specifier makes me think that the intention was to log the stacks. This made debugging [this](cockroachdb#83677 (comment)) trivial as opposed to hard. Release note: None
|
83875: opt: fix memo cycle caused by join ordering r=DrewKimball a=DrewKimball In some rare cases when null-rejection rules fail to fire, a redundant filter can be inferred in an `InnerJoin` - `LeftJoin` complex. This could previously result in the `JoinOrderBuilder` attempting to add a `Select` to the same memo group as its input, which would create a memo cycle. This patch prevents the `JoinOrderBuilder` from adding the `Select` to the memo in such cases. What follows is a more detailed explanation of the conditions that could previously cause a memo cycle. `InnerJoin` operators have two properties that make them more 'reorderable' than other types of joins: (1) their conjuncts can be reordered separately and (2) new conjuncts can be inferred from equalities. As a special case of (1), an `InnerJoin` can be pushed into the left side of a `LeftJoin`, leaving behind any `Select` conjuncts that reference the right side of the `LeftJoin`. This allows the `JoinOrderBuilder` to make the following transformation: ``` (InnerJoin A (InnerJoin B (LeftJoin C D c=d ) b=c ) a=b, a=d ) => (InnerJoin A (Select (LeftJoin (InnerJoin B C b=c ) D c=d ) b=d // Inferred filter! ) a=b, a=d ) ``` Note the new `b=d` filter that was inferred and subsequently left on a `Select` operator after the reordering. The crucial point is that this filter is redundant - the input to the `Select` is already a valid reordering of the `BCD` join complex. The `JoinOrderBuilder` avoids adding redundant filters to `InnerJoin` operators, but does not do the same for the `Select` case because it was assumed that the filters left on the `Select` would never be redundant. This is because the `a=d` filter *should* rejects nulls, so the `LeftJoin` should have been simplified. However, in rare cases null-rejection does not take place. Because the input to the `Select` is already a valid reordering, the `JoinOrderBuilder` ends up trying to add the `Select` to the same group as its input - namely, the `BCD` join group. This causes a cycle in the memo. Fixes #80901 Release note (bug fix): Fixed a bug that could cause an optimizer panic in rare cases when a query had a left join in the input of an inner join. 83945: storage/metamorphic: Add MVCC delete range using tombstone r=erikgrinaker a=itsbilal This change adds MVCCDeleteRangeUsingTombstone to the MVCC metamorphic tests. MVCCDeleteRange had already existed in this test suite, so this ended up being a relatively simple addition. One part of #82545, with possibly more parts to follow as other MVCC-level operations are added that utilize `writer.{Put,Clear}{MVCC,Engine}RangeKey`. Release note: None. 83992: server: log stacks on 500 errors r=ajwerner a=ajwerner Before this change, we'd just log the error body, which often is not very helpful. The format specifier makes me think that the intention was to log the stacks. This made debugging [this](#83677 (comment)) trivial as opposed to hard. Release note: None 84015: bazel: clear configurations when running `git grep` in `check.sh` r=miretskiy,mari-crl a=rickystewart The configurations `grep.{column,lineNumber,fullName}` can be set globally or on a per-user basis and change the output of `git grep`, which breaks checks that do exact-string matching. We manually clear these configurations for the `git grep`s in this file to ensure the output is predictable on any machine. Release note: None Co-authored-by: Andrew Kimball <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending #83935
This commit adds 3 new parameter to the table details endpoint: - dataTotalBytes - dataLiveBytes - dataLivePercentage Partially addresses cockroachdb#82617 Release note (api change): Add information about total bytes, live (non MVCC) bytes and live (non MVCC) percentage to Table Details endpoint.
TFTR! |
Build succeeded: |
This commit adds 3 new parameter to the table
details endpoint:
Partially addresses #82617
Release note (api change): Add information about total bytes,
non live (MVCC) bytes and non live (MVCC) percentage to
Table Details endpoint.