Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add SHOW TENANT name WITH REPLICATION STATUS #92628

Merged
merged 1 commit into from
Dec 8, 2022

Conversation

lidorcarmel
Copy link
Contributor

@lidorcarmel lidorcarmel commented Nov 28, 2022

Extending SHOW TENANT to also allow showing replication status which contains info such as the protected timestamp on the destination cluster and the source cluster name.

Command output right after the destination cluster is created:

[email protected]:26257/defaultdb> show tenant dest5 with replication status;
  id | name  | status | source_tenant_name | source_cluster_uri | replication_job_id | replicated_time | retained_time
-----+-------+--------+--------------------+--------------------+--------------------+-----------------+----------------
   7 | dest5 | ADD    | NULL               | NULL               | 819890711267737601 | NULL            | NULL
(1 row)

A bit later we have most stats (manually adjusting the source_cluster_uri):

[email protected]:26257/defaultdb> show tenant dest5 with replication status;
  id | name  | status | source_tenant_name | source_cluster_uri                                    | replication_job_id | replicated_time |       retained_time
-----+-------+--------+--------------------+-------------------------------------------------------+--------------------+-----------------+-----------------------------
   7 | dest5 | ADD    | src                | postgresql://[email protected]:26257/defaultdb?ssl...crt | 819890711267737601 | NULL            | 2022-12-05 23:00:04.516331
(1 row)

And a moment later the replication time is populated.

Fixes: #91261

Epic: CRDB-18749

Release note: None

@lidorcarmel lidorcarmel requested review from stevendanna, adityamaru and a team November 28, 2022 23:05
@lidorcarmel lidorcarmel requested a review from a team as a code owner November 28, 2022 23:05
@lidorcarmel lidorcarmel force-pushed the lidor_show_tenant_with branch from a62463f to 4112186 Compare November 29, 2022 22:18
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but two blocking comments:

  1. Should this code live in CCL?

  2. Should we start using expr for the tenant name based on our discussion last week that its better to start with expr and then become more restrictive in the future if the need be.

pkg/sql/parser/sql.y Outdated Show resolved Hide resolved
@adityamaru adityamaru self-requested a review December 1, 2022 17:24
@lidorcarmel lidorcarmel force-pushed the lidor_show_tenant_with branch from 4112186 to f16a876 Compare December 2, 2022 22:04
@lidorcarmel
Copy link
Contributor Author

Overall LGTM but two blocking comments:

  1. Should this code live in CCL?

So, maybe? IIUC we need SHOW TENANT to stay in BSL land and SHOW TENANT name WITH REPLICATION STATUS to move to CCL, which means that we need to break things to 2 different files. The benefit is not super clear to me because code in show_tenant.go doesn't do much.. just runs the ccl code. So, I was thinking it's fine to keep it as is and move stuff later if it adds some value? that said, if I'm missing something then I can do that in this PR.
Thanks for the offline chat about this!

  1. Should we start using expr for the tenant name based on our discussion last week that its better to start with expr and then become more restrictive in the future if the need be.

Done, d_expr.

@adityamaru adityamaru self-requested a review December 5, 2022 14:01
pkg/sql/catalog/colinfo/result_columns.go Show resolved Hide resolved
pkg/sql/show_tenant.go Outdated Show resolved Hide resolved
replicationInfo *streampb.StreamIngestionStats
pts hlc.Timestamp
columns colinfo.ResultColumns
done bool
}

func (p *planner) ShowTenant(_ context.Context, n *tree.ShowTenant) (planNode, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta comment about the code structure, I think we should follow a pattern similar to sql/tenant_settings.go. Namely, this method should:

  1. RequireAdminRole
  2. Ensure we are running as the SystemTenant
  3. Call p.analyzeExpr to get a TypedExpr from the Expr for the name. (this will give us subquery support for free, and seems to be the canonical way to evaluate expressions to specialized types).
  4. Stash the TypedExpr in the showTenantNode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 and 2 - Done.

re 3 and 4, let's do that later? I saw the TypedExpr thing but I prefer adding this when I can actually test subqueries, otherwise it feels like making things a bit more complicated without actual impact. if for example we would not want subqueries then I think the existing code is a bit easier to read compared to the TypedExpr thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little weird to me to be accepting an expression but then only using .String() when using the expression, instead of analyzing/asserting the type of the expression in a canonical manner. imo if we are using d_expr we should do the annoying work of evaluating the expression all the way through in the same diff. We can test subqueries, concat expressions etc. once we have the datadriven PR checked in, which is what we're doing for some of the other tests in the e2e file as well.

Alternatively, we can go back to using tree.Name and change this to d_expr when we change it for CREATE/DROP TENANT as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! definitely need to add tests for this, data driven.
PTAL.

} else {
node.columns = colinfo.TenantColumns
}
return node, nil
}

func (n *showTenantNode) startExec(params runParams) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following from the comment above, this method should then evaluate the typed Expr in a manner similar to resolveTenantID to get our hands on tree.DString which can then be converted to a roachpb.TenantName to be used below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

pkg/sql/show_tenant.go Outdated Show resolved Hide resolved
pkg/sql/show_tenant.go Outdated Show resolved Hide resolved
require.Equal(t, ingestionJobID, jobId)
require.Less(t, maxReplTime, timeutil.Now())
require.Less(t, protectedTime, timeutil.Now())
require.Greater(t, maxReplTime, testStartTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think a tighter check would be if require.Greater(t, maxReplTime, highWatermark of C2C we know we have reached)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like that?

@lidorcarmel lidorcarmel force-pushed the lidor_show_tenant_with branch from f16a876 to ea05406 Compare December 6, 2022 05:08
Copy link
Contributor Author

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Aditya for the thorough review!

require.Equal(t, ingestionJobID, jobId)
require.Less(t, maxReplTime, timeutil.Now())
require.Less(t, protectedTime, timeutil.Now())
require.Greater(t, maxReplTime, testStartTime)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like that?

pkg/sql/show_tenant.go Outdated Show resolved Hide resolved
pkg/sql/catalog/colinfo/result_columns.go Show resolved Hide resolved
pkg/sql/show_tenant.go Outdated Show resolved Hide resolved
pkg/sql/show_tenant.go Outdated Show resolved Hide resolved
replicationInfo *streampb.StreamIngestionStats
pts hlc.Timestamp
columns colinfo.ResultColumns
done bool
}

func (p *planner) ShowTenant(_ context.Context, n *tree.ShowTenant) (planNode, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 and 2 - Done.

re 3 and 4, let's do that later? I saw the TypedExpr thing but I prefer adding this when I can actually test subqueries, otherwise it feels like making things a bit more complicated without actual impact. if for example we would not want subqueries then I think the existing code is a bit easier to read compared to the TypedExpr thing.

} else {
node.columns = colinfo.TenantColumns
}
return node, nil
}

func (n *showTenantNode) startExec(params runParams) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, I think I only disagree about how we're doing expression evaluation currently. Even with that, if you'd rather check this in and fast follow once the datadriven framework works, I'm okay with that. I'm not worried about us never getting to this because we need to fix it for CREATE/DROP TENANT too.

replicatedTimestamp := tree.DNull
retainedTimestamp := tree.DNull

// New replication clusters don't have replicationInfo initially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: what does this mean? I'm not sure I follow what case we're guarding against here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped, it's not clear and repeating the comment above. See the comment and todo after calling mgr.GetStreamIngestionStats() above? hopefully this would make sense.

return err
}
stats, err := mgr.GetStreamIngestionStats(params.ctx, info.TenantReplicationJobID)
// An error means we don't have stats but we can still present some info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we not have stats if we have an active replication job as we have asserted above by checking `TenantReplicationJobID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, we have most stats but not all because we fail here:

client, err := streamclient.GetFirstActiveClient(ctx, progress.GetStreamIngest().StreamAddresses)

I think it will be nice to refactor and return what we can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I see, lets log the error if its non-nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -60,6 +60,9 @@ id name status
statement error tenant "seven" does not exist
SHOW TENANT seven

statement error tenant two does not have replication info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I would expect the error we hit would indicate there is no active replication job rather than info, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that failed in ci, fixed now.

replicationInfo *streampb.StreamIngestionStats
pts hlc.Timestamp
columns colinfo.ResultColumns
done bool
}

func (p *planner) ShowTenant(_ context.Context, n *tree.ShowTenant) (planNode, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little weird to me to be accepting an expression but then only using .String() when using the expression, instead of analyzing/asserting the type of the expression in a canonical manner. imo if we are using d_expr we should do the annoying work of evaluating the expression all the way through in the same diff. We can test subqueries, concat expressions etc. once we have the datadriven PR checked in, which is what we're doing for some of the other tests in the e2e file as well.

Alternatively, we can go back to using tree.Name and change this to d_expr when we change it for CREATE/DROP TENANT as well.

@adityamaru adityamaru self-requested a review December 6, 2022 14:14
@adityamaru adityamaru self-requested a review December 6, 2022 14:17
@lidorcarmel lidorcarmel force-pushed the lidor_show_tenant_with branch from ea05406 to 029e8da Compare December 6, 2022 23:40
Copy link
Contributor Author

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! PTAL.

replicationInfo *streampb.StreamIngestionStats
pts hlc.Timestamp
columns colinfo.ResultColumns
done bool
}

func (p *planner) ShowTenant(_ context.Context, n *tree.ShowTenant) (planNode, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! definitely need to add tests for this, data driven.
PTAL.

@@ -60,6 +60,9 @@ id name status
statement error tenant "seven" does not exist
SHOW TENANT seven

statement error tenant two does not have replication info
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that failed in ci, fixed now.

return err
}
stats, err := mgr.GetStreamIngestionStats(params.ctx, info.TenantReplicationJobID)
// An error means we don't have stats but we can still present some info,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, we have most stats but not all because we fail here:

client, err := streamclient.GetFirstActiveClient(ctx, progress.GetStreamIngest().StreamAddresses)

I think it will be nice to refactor and return what we can.

Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through this! LGTM, we can add more tests soon.

I might be missing it, but I don't see a test with a hyphenated name.

return err
}
stats, err := mgr.GetStreamIngestionStats(params.ctx, info.TenantReplicationJobID)
// An error means we don't have stats but we can still present some info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I see, lets log the error if its non-nil?

}

var dummyHelper tree.IndexedVarHelper
strName := paramparse.UnresolvedNameToStrVal(n.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why we need this call to UnresolvedNameToStrVal instead of just passing in n.Name to analyzeExpr below? My understanding is that this method is only called in the context of SET clauses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one was indeed an annoying one, without this I get:

    logic.go:3182:

        /private/var/tmp/_bazel_lidorcarmel/8fa5a2c4db0103c8ea8fcdb54466d4b1/sandbox/darwin-sandbox/11989/execroot/com_github_cockroachdb_cockroach/bazel-out/darwin-fastbuild/bin/pkg/sql/logictest/tests/fakedist-vec-off/fakedist-vec-off_test_/fakedist-vec-off_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/tenant:30: SHOW TENANT system
        expected success, but found
        (42703) column "system" does not exist
        type_check.go:1433: in TypeCheck()
    logic.go:2856:
         pq: column "system" does not exist

is there a better way around this? I poked around a bit and other than calling tree.NewStrVal(tree.AsStringWithFlags(s, tree.FmtBareIdentifiers)) directly I'm not sure what is more appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm good point, I think SHOW TENANT 'system' will work but I'm not sure what we should strive for here. SHOW TENANT system or SHOW TENANT 'system'. I was going with the latter in ALTER TENANT which also uses a d_expr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, I was definitely assuming we need SHOW TENANT system, like SHOW CREATE TABLE system.tenants (maybe there are better examples?).

@lidorcarmel lidorcarmel force-pushed the lidor_show_tenant_with branch from 029e8da to 4036a9c Compare December 7, 2022 17:50
Copy link
Contributor Author

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason I didn't push the hyphen test, it's pushed now.

return err
}
stats, err := mgr.GetStreamIngestionStats(params.ctx, info.TenantReplicationJobID)
// An error means we don't have stats but we can still present some info,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

var dummyHelper tree.IndexedVarHelper
strName := paramparse.UnresolvedNameToStrVal(n.Name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one was indeed an annoying one, without this I get:

    logic.go:3182:

        /private/var/tmp/_bazel_lidorcarmel/8fa5a2c4db0103c8ea8fcdb54466d4b1/sandbox/darwin-sandbox/11989/execroot/com_github_cockroachdb_cockroach/bazel-out/darwin-fastbuild/bin/pkg/sql/logictest/tests/fakedist-vec-off/fakedist-vec-off_test_/fakedist-vec-off_test.runfiles/com_github_cockroachdb_cockroach/pkg/sql/logictest/testdata/logic_test/tenant:30: SHOW TENANT system
        expected success, but found
        (42703) column "system" does not exist
        type_check.go:1433: in TypeCheck()
    logic.go:2856:
         pq: column "system" does not exist

is there a better way around this? I poked around a bit and other than calling tree.NewStrVal(tree.AsStringWithFlags(s, tree.FmtBareIdentifiers)) directly I'm not sure what is more appropriate.

@lidorcarmel lidorcarmel force-pushed the lidor_show_tenant_with branch 2 times, most recently from 7a3319b to eeceee6 Compare December 8, 2022 18:38
Extending SHOW TENANT to also allow showing replication status which
contains info such as the protected timestamp on the destination cluster
and the source cluster name.

Command output right after the destination cluster is created:

```
[email protected]:26257/defaultdb> show tenant dest5 with replication status;
  id | name  | status | source_tenant_name | source_cluster_uri | replication_job_id | replicated_time | retained_time
-----+-------+--------+--------------------+--------------------+--------------------+-----------------+----------------
   7 | dest5 | ADD    | NULL               | NULL               | 819890711267737601 | NULL            | NULL
(1 row)
```

A bit later we have most stats (manually adjusting the source_cluster_uri):
```
[email protected]:26257/defaultdb> show tenant dest5 with replication status;
  id | name  | status | source_tenant_name | source_cluster_uri                                    | replication_job_id | replicated_time |       retained_time
-----+-------+--------+--------------------+-------------------------------------------------------+--------------------+-----------------+-----------------------------
   7 | dest5 | ADD    | src                | postgresql://[email protected]:26257/defaultdb?ssl...crt | 819890711267737601 | NULL            | 2022-12-05 23:00:04.516331
(1 row)
```

And a moment later the replication time is populated.

Informs: cockroachdb#91261

Epic: CRDB-18749

Release note: None
@lidorcarmel lidorcarmel force-pushed the lidor_show_tenant_with branch from eeceee6 to dcd183c Compare December 8, 2022 18:44
@lidorcarmel
Copy link
Contributor Author

TFTR Aditya!
bors r+

@craig craig bot merged commit fe683e3 into cockroachdb:master Dec 8, 2022
@craig
Copy link
Contributor

craig bot commented Dec 8, 2022

Build succeeded:

@lidorcarmel lidorcarmel deleted the lidor_show_tenant_with branch January 24, 2023 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

c2c: introduce SHOW TENANT to surface information about a tenant record
3 participants