Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: return non-live, live info per table #83677

Merged
merged 1 commit into from
Jul 12, 2022

Conversation

maryliag
Copy link
Contributor

@maryliag maryliag commented Jun 30, 2022

This commit adds 3 new parameter to the table
details endpoint:

  • dataTotalBytes
  • dataNonLiveBytes
  • dataNonLivePercentage

Partially addresses #82617

Release note (api change): Add information about total bytes,
non live (MVCC) bytes and non live (MVCC) percentage to
Table Details endpoint.

@maryliag maryliag requested review from a team as code owners June 30, 2022 20:24
@maryliag maryliag requested a review from a team June 30, 2022 20:24
@maryliag maryliag marked this pull request as draft June 30, 2022 20:24
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@maryliag maryliag removed request for a team June 30, 2022 20:24
@maryliag
Copy link
Contributor Author

@ajwerner I create this version with the most straight forward solution. Are there improvements you can see I should be adding here from the start?

@maryliag maryliag force-pushed the mvcc-database branch 3 times, most recently from 86d6787 to 4c4e5f5 Compare June 30, 2022 20:35
pkg/server/admin.go Outdated Show resolved Hide resolved
@ajwerner
Copy link
Contributor

My fear is that for large tables this is going to now make the request just not work where before it worked. I don't really know what to do about that.

@maryliag
Copy link
Contributor Author

Anything we can do with the info about total range count from here which is just a little below (and I could move my changes to after that)?
In the sense of if this count passes a threshold, I don't try to calculate the garbage? Or have another endpoint specific for it which will load the value independently? But if we follow your example of waiting 8min for the result, might not be worth trying this at all (at least with this implementation)

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2022

I have a cute idea which might solve both the problems here and the problems in crdb_internal.ranges. What if we built a vectorized operator for the function? Then we could issue the range status lookups in parallel. It might even be backportable to fix the crdb_internal.ranges issues.

@yuzefovich how crazy does it sound? It sort of looks like if I had an implementation and hooked it up in the below code, then maybe it'd work? Is there more to know about planning to make the pieces fit together?

switch overload.SpecializedVecBuiltin {

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow your idea - what exactly would benefit from the vectorization? Or are you thinking that in the vectorized infrastructure we could create a specialized operator for a particular builtin function which internally has parallelism (to issue those range status lookups) unlike in the row-by-row infra? That seems reasonable to me, and I think the code you're pointing at would be the right entry point, and then things should mostly just work.

There would be some minor changes to expose the fact of concurrency in this builtin (if we're using a Txn, we'd have to ask for a LeafTxn, also vectorizedFlowCreator.operatorConcurrency from vectorized_flow.go will need to be set to true - actually doing that will ensure that we'll get a LeafTxn).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2022

Or are you thinking that in the vectorized infrastructure we could create a specialized operator for a particular builtin function which internally has parallelism (to issue those range status lookups) unlike in the row-by-row infra?

Precisely.

There would be some minor changes to expose the fact of concurrency in this builtin (if we're using a Txn, we'd have to ask for a LeafTxn, also vectorizedFlowCreator.operatorConcurrency from vectorized_flow.go will need to be set to true - actually doing that will ensure that we'll get a LeafTxn).

Fortunately RangeStatsRequest is non-transactional, so I don't think we'd need anything like that. Any other gotchas?

@yuzefovich
Copy link
Member

Any other gotchas?

Hm, no, nothing else comes to mind (as long as the queries using this new builtin don't also include things like CDC / BulkIO - those don't run through the vectorized infra).

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2022

@yuzefovich maybe you can help me out. I typed the code but it doesn't quite work (also I'm sure I'm not doing memory monitoring correctly, or at all). My understanding of why is that we don't actually vectorize the rendering of the expression. What can I do to force it to use the vectorized operator?

explain (vec) select crdb_internal.range_stats(start_key) from crdb_internal.ranges limit 1;
              info
--------------------------------
  │
  └ Node 1
    └ *sql.planNodeToRowSource
(3 rows)

leads to the builtin being called as part of evaluation in the ProcessRowHelper as part of planNodeToRowSource.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2022

Maybe this code here is the problem, or at least, the source of inflexibility:

planCtx := e.getPlanCtx(cannotDistribute)
// TODO: don't use a stored context.
ctx := planCtx.EvalContext().Context
physPlan, err := e.dsp.wrapPlan(ctx, planCtx, d, e.planningMode != distSQLLocalOnlyPlanning)
if err != nil {
return nil, err
}
physPlan.ResultColumns = d.columns
return makePlanMaybePhysical(physPlan, []planNode{d}), nil

@yuzefovich
Copy link
Member

I think the problem is the usage of crdb_internal.ranges which is a materialized view for which we don't have a corresponding processor / operator, so it gets wrapped in

func (dsp *DistSQLPlanner) wrapPlan(
(essentially the same code that you linked to, just on the main exec factory path).

@yuzefovich
Copy link
Member

(Maybe it's not a "materialized view" but it's something (namely "virtual table" inside of it - crdb_internal.ranges_no_leases) that is not powered by the DistSQL, so we, instead, create a planNode that is wrapped via LocalPlanNode wrapper into the DistSQL infra). (This could be more easily seen with EXPLAIN (DISTSQL).) And then you're right - we don't look inside of that planNode at all.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2022

Hmm, is there a way to force the code to look inside the expressions and inject some intermediate render node or something like that if there are optimized functions? I imagine there's a way to go from rows->columns->render->rows

@yuzefovich
Copy link
Member

No, unfortunately nothing like that exists. The thing is that this planNode-into-DistSQL wrapping was added before the vectorized execution, when we were merging the local and the DistSQL row-by-row execution models, and we never really changed this wrapping. It does try to find the closest child planNode that we don't have to wrap, but this is not enough in this case. I think the problem here is that we have virtualPlanNode (which internally does the rendering you're interested in optimizing), and we treat this planNode as a black-box when performing DistSQL / vectorized planning.

@yuzefovich
Copy link
Member

Oh, actually, I take this, at least partially, back - the problem might be simpler than I initially thought. It does look like the rendering is in PostProcessSpec and not in the processor core, and we could do the vectorized planning for it. I actually have been thinking about doing that for some time (in order to support more efficient vectorized filtering on top of the joinReader processor). So you are right - we should be able to get to "rows -> columns -> vectorized rendering -> rows" model.

@yuzefovich
Copy link
Member

I typed up #83689 for that idea, but I think it's broken. However, it might be sufficient for your prototype.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2022

I didn't really follow your draft PR. Maybe we can collaborate on this for a little bit this afternoon if you're willing or some time next week. I've reached the point in the code where my lack of understanding of concepts means I'd need to learn a lot to make progress.

@yuzefovich
Copy link
Member

I polished #83689, and I think it is in a good shape (modulo green CI), so @ajwerner if you try your prototype with the vectorized builtin on top of that PR, I think you should see the builtin be vectorized.

@yuzefovich
Copy link
Member

(Bart delays provide ample time to code lol)

@yuzefovich
Copy link
Member

Indeed:

[email protected]:26257/defaultdb> explain (vec) select crdb_internal.range_stats(start_key) from crdb_internal.ranges limit 1;
                   info
-------------------------------------------
  │
  └ Node 1
    └ *colexec.defaultBuiltinFuncOperator
      └ *sql.planNodeToRowSource
(4 rows)

@maryliag maryliag force-pushed the mvcc-database branch 2 times, most recently from 45244fa to f86dcc9 Compare July 6, 2022 18:09
@maryliag maryliag requested a review from a team July 6, 2022 18:09
@maryliag maryliag marked this pull request as ready for review July 6, 2022 18:10
@maryliag maryliag changed the title server: return garbage percentage per table server: return non-live, live info per table Jul 7, 2022
@maryliag
Copy link
Contributor Author

maryliag commented Jul 7, 2022

@ajwerner @yuzefovich do you know if the range tables are not available during tests?
Tests are failing on this line, returning 500 An internal server error has occurred, but I'm having a hard time finding the root error here. I tried to manually do the same steps the test using demo and it works, so not sure what I'm missing.
Is there anything else I should be initializing here?

@ajwerner
Copy link
Contributor

ajwerner commented Jul 7, 2022

Your code seems sad here:

if err := scanner.Scan(row, "total", &totalBytes); err != nil {

It seems that the type is wrong the column it claims. Maybe you need to name the projection with the cast? I'm not sure.

ajwerner added a commit to ajwerner/cockroach that referenced this pull request Jul 7, 2022
Before this change, we'd just log the error body, which often is not very
helpful. The format specifier makes me think that the intention was to log
the stacks. This made debugging [this](cockroachdb#83677 (comment))
trivial as opposed to hard.

Release note: None
@ajwerner
Copy link
Contributor

ajwerner commented Jul 7, 2022

E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600  source type assertion failed                                                                                                                                                                                                                  
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +(1) attached stack trace                                                                                                                                                                                                                      
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  -- stack trace:                                                        
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/server.resultScanner.ScanIndex                                                                                                                                                                       
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/server/admin.go:3159                                                                                                                                                                         
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/server.resultScanner.Scan                                                                                                                                                                            
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/server/admin.go:3261                                                                                                                                                                         
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/server.(*adminServer).tableDetailsHelper                                                                                                                                                             
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/server/admin.go:933                                                                                                                                                                          
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/server.(*adminServer).TableDetails                                                                                                                                                                   
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/server/admin.go:678                                                                                                                                                                          
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/server/serverpb._Admin_TableDetails_Handler.func1                                                                                                                                                    
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/darwin-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/admin.pb.go:4817                                  
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor.ServerInterceptor.func1                                                                                                                                                 
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:114                                                                                                                                         
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | google.golang.org/grpc.chainUnaryInterceptors.func1.1                                                                                                                                                                                     
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         google.golang.org/grpc/external/org_golang_google_grpc/server.go:1117                                                                                                                                                             
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func3                                                                                                                                                                                
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:264                                                                                                                                                                   
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | google.golang.org/grpc.chainUnaryInterceptors.func1.1                                                                                                                                                                                     
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         google.golang.org/grpc/external/org_golang_google_grpc/server.go:1120                                                                                                                                                             
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/rpc.kvAuth.unaryInterceptor                                                                                                                                                                          
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:73                 
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | google.golang.org/grpc.chainUnaryInterceptors.func1.1                           
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         google.golang.org/grpc/external/org_golang_google_grpc/server.go:1120                                                                                                                                                             
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func1.1                                                                                                                                                                              
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:233                                                                                                                                                                   
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr                                                                                                                                                                  
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:341                                                                                                                                                                     
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func1                                                                                                                                                                                
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:231                                                                                                                                                                   
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | google.golang.org/grpc.chainUnaryInterceptors.func1.1                                
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         google.golang.org/grpc/external/org_golang_google_grpc/server.go:1120
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | google.golang.org/grpc.chainUnaryInterceptors.func1                                                                                                                                                                                       
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         google.golang.org/grpc/external/org_golang_google_grpc/server.go:1122                                                                                                                                                             
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | github.com/cockroachdb/cockroach/pkg/server/serverpb._Admin_TableDetails_Handler                                                                                                                                                          
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/darwin-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/admin.pb.go:4819                                  
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | google.golang.org/grpc.(*Server).processUnaryRPC                                                                                                                                                                                          
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         google.golang.org/grpc/external/org_golang_google_grpc/server.go:1283                                                                                                                                                             
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | google.golang.org/grpc.(*Server).handleStream                                                                                                                                                                                             
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         google.golang.org/grpc/external/org_golang_google_grpc/server.go:1620                                                                                                                                                             
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | google.golang.org/grpc.(*Server).serveStreams.func1.2                                                                                                                                                                                     
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         google.golang.org/grpc/external/org_golang_google_grpc/server.go:922                                                                                                                                                              
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  | runtime.goexit                                                                                                                                                                                                                            
E220707 14:03:18.840447 5147 server/admin.go:300  [n1] 600 +  |         GOROOT/src/runtime/asm_amd64.s:1581 

#83992

craig bot pushed a commit that referenced this pull request Jul 7, 2022
83875: opt: fix memo cycle caused by join ordering r=DrewKimball a=DrewKimball

In some rare cases when null-rejection rules fail to fire, a redundant filter
can be inferred in an `InnerJoin` - `LeftJoin` complex. This could previously
result in the `JoinOrderBuilder` attempting to add a `Select` to the same memo
group as its input, which would create a memo cycle. This patch prevents the
`JoinOrderBuilder` from adding the `Select` to the memo in such cases.

What follows is a more detailed explanation of the conditions that could
previously cause a memo cycle.

`InnerJoin` operators have two properties that make them more 'reorderable'
than other types of joins: (1) their conjuncts can be reordered separately
and (2) new conjuncts can be inferred from equalities. As a special case of
(1), an `InnerJoin` can be pushed into the left side of a `LeftJoin`, leaving
behind any `Select` conjuncts that reference the right side of the `LeftJoin`.

This allows the `JoinOrderBuilder` to make the following transformation:
```
(InnerJoin
   A
   (InnerJoin
      B
      (LeftJoin
         C
         D
         c=d
      )
      b=c
   )
   a=b, a=d
)
=>
(InnerJoin
   A
   (Select
      (LeftJoin
         (InnerJoin
            B
            C
            b=c
         )
         D
         c=d
      )
      b=d // Inferred filter!
   )
   a=b, a=d
)
```
Note the new `b=d` filter that was inferred and subsequently left on
a `Select` operator after the reordering. The crucial point is that
this filter is redundant - the input to the `Select` is already a
valid reordering of the `BCD` join complex.

The `JoinOrderBuilder` avoids adding redundant filters to `InnerJoin`
operators, but does not do the same for the `Select` case because it
was assumed that the filters left on the `Select` would never be redundant.
This is because the `a=d` filter *should* rejects nulls, so the `LeftJoin`
should have been simplified. However, in rare cases null-rejection does not
take place. Because the input to the `Select` is already a valid reordering,
the `JoinOrderBuilder` ends up trying to add the `Select` to the same group
as its input - namely, the `BCD` join group. This causes a cycle in the memo.

Fixes #80901

Release note (bug fix): Fixed a bug that could cause an optimizer
panic in rare cases when a query had a left join in the input of
an inner join.

83945: storage/metamorphic: Add MVCC delete range using tombstone r=erikgrinaker a=itsbilal

This change adds MVCCDeleteRangeUsingTombstone to the MVCC
metamorphic tests. MVCCDeleteRange had already existed in
this test suite, so this ended up being a relatively simple
addition.

One part of #82545, with possibly more parts to follow
as other MVCC-level operations are added that utilize
`writer.{Put,Clear}{MVCC,Engine}RangeKey`.

Release note: None.

83992: server: log stacks on 500 errors r=ajwerner a=ajwerner

Before this change, we'd just log the error body, which often is not very
helpful. The format specifier makes me think that the intention was to log
the stacks. This made debugging [this](#83677 (comment))
trivial as opposed to hard.

Release note: None

84015: bazel: clear configurations when running `git grep` in `check.sh` r=miretskiy,mari-crl a=rickystewart

The configurations `grep.{column,lineNumber,fullName}` can be set
globally or on a per-user basis and change the output of `git grep`,
which breaks checks that do exact-string matching. We manually clear
these configurations for the `git grep`s in this file to ensure the
output is predictable on any machine.

Release note: None

Co-authored-by: Andrew Kimball <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@maryliag maryliag requested review from ajwerner and kevin-v-ngo July 8, 2022 21:45
Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending #83935

This commit adds 3 new parameter to the table
details endpoint:
- dataTotalBytes
- dataLiveBytes
- dataLivePercentage

Partially addresses cockroachdb#82617

Release note (api change): Add information about total bytes,
live (non MVCC) bytes and live (non MVCC) percentage to
Table Details endpoint.
@maryliag
Copy link
Contributor Author

TFTR!
bors r+

@craig
Copy link
Contributor

craig bot commented Jul 12, 2022

Build succeeded:

@craig craig bot merged commit 0aa3c44 into cockroachdb:master Jul 12, 2022
@maryliag maryliag deleted the mvcc-database branch July 12, 2022 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants