Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to use crdb_internal #2957

Closed
lnhsingh opened this issue Apr 19, 2018 · 19 comments · Fixed by #8909
Closed

Document how to use crdb_internal #2957

lnhsingh opened this issue Apr 19, 2018 · 19 comments · Fixed by #8909
Assignees
Labels
O-product P-2 Normal priority; secondary task
Milestone

Comments

@lnhsingh
Copy link
Contributor

lnhsingh commented Apr 19, 2018

In trying to figure out how to read from the information_schema table, it was suggested by @benesch that it might be beneficial to teach people how to use crdb_internal.

Taken from slack:

My opinion is that it probably makes more sense to teach people to use crdb_internal
information_schema is technically "stable" but it has a bunch of random crap because it needs to describe any SQL database ever
And crdb_internal only has stuff that's actually relevant to Cockroach
@benesch
Copy link
Contributor

benesch commented Apr 19, 2018

The pushback here will be that we don't want applications to rely on crdb_internal. If production applications query crdb_internal in we'll be locked in to backwards compatibility, and we've tried hard to avoid that so far. I think we could solve this with big "EXPERIMENTAL ONLY" callouts—it is super useful for one off-introspections a broken cluster, where backwards compatibility concerns don't apply—but now that I think about it I am kind of nervous about making it easier to use crdb_internal.

Maybe the right answer is to ensure that absolutely everything useful in crdb_internal eventually has a stable interface. Was there something in particular you were trying to read from information_schema, @lhirata?

@lnhsingh
Copy link
Contributor Author

Interesting - will see what @bdarnell and @jseldess say too.

Re: information_schema, I was trying to remember how to read from it (for table constraints). Was able to figure it out by googling it. I ended up adding examples to the Information Schema doc, to hopefully make it easier for future me to figure out: #2958

@benesch
Copy link
Contributor

benesch commented Apr 19, 2018

Ah, I see! Yeah, definitely a good idea to show people a representative example or two. But documenting all of its facets will be a lot of effort for little payoff, especially given that it's mostly documented elsewhere (e.g., in the Postgres docs and the SQL standard).

Also we have a much nicer SHOW CONSTRAINTS for the specific case of showing a table's constraints. Apologies if you already knew that!

@jseldess
Copy link
Contributor

Adding a comment from @bdarnell:

In general my thinking is that crdb_internal should remain internal-only and undocumented. we should take the stuff that's useful enough to document and put it behind SHOW commands or maybe make a separate crdb_catalog virtual schema with more stability/documentation guarantees
i also lean towards documenting information_schema instead of introducing yet another db-specific view of this data, although i don't know enough about what information_schema actually looks like to have an informed opinion here.

Closing this as a result. But I'll ping product on identifying useful parts of crdb_internal to put behind SHOW statements.

@awoods187
Copy link
Contributor

I spoke with @bdarnell and @jordanlewis about this issue today in service of partitioning. We decided that the cat is out of the proverbial bag and it makes sense to document our internal tables rather than making a crdb_catalog.

As @bdarnell notes:

this just means we need to hold ourselves to a higher backwards-compatibility standard when introducing things in crdb_internal. Further, we're already doing that as we have to introduce ranges_no_leases instead of changing ranges

On the topic of backwards compatibility:

we will need to come up with a process to warn users about changes. It is unrelated to "last 3 releases" and we will need to introduce some sort of user-visible deprecation process with warnings so they can know to update their scripts (or potentially break scripting)

@awoods187 awoods187 reopened this Jun 26, 2019
@awoods187 awoods187 added P-1 High priority; must be done this release A-sql labels Jun 26, 2019
@jseldess
Copy link
Contributor

@awoods187, @bdarnell, since we're deciding to document crdb_internal, shouldn't that database be exposed like system?

~/go/src/github.com/cockroachdb/docs$ git commit -m "New image"
[cluster-patterns-v2 8880b114] New image
 2 files changed, 2 insertions(+)
 create mode 100644 images/v19.1/geo-partitioning_no_map.png
~/go/src/github.com/cockroachdb/docs$ cockroach demo
#
# Welcome to the CockroachDB demo database!
#
# You are connected to a temporary, in-memory CockroachDB
# instance. Your changes will not be saved!
#
# Web UI: http://127.0.0.1:55694
#
# Server version: CockroachDB CCL v19.2.0-alpha.20190606-95-ga581c20147 (x86_64-apple-darwin17.7.0, built 2019/06/12 18:17:23, go1.11) (same version as client)
# Cluster ID: 1810f03c-2d74-4111-a930-bd08c484411a
#
# Enter \? for a brief introduction.
#
[email protected]:55693/defaultdb> show databases;
  database_name
+---------------+
  defaultdb
  postgres
  system
(3 rows)

Time: 4.305ms

And it's a pretty big database, so it will be important to identify which tables are most important for users to know about and when they could/should be used. Who can help with this? Ideally, we would even auto-generate this documentation.

[email protected]:55693/defaultdb> show tables from crdb_internal;
         table_name
+---------------------------+
  backward_dependencies
  builtin_functions
  cluster_queries
  cluster_sessions
  cluster_settings
  create_statements
  feature_usage
  forward_dependencies
  gossip_alerts
  gossip_liveness
  gossip_network
  gossip_nodes
  index_columns
  jobs
  kv_node_status
  kv_store_status
  leases
  node_build_info
  node_metrics
  node_queries
  node_runtime_info
  node_sessions
  node_statement_statistics
  partitions
  predefined_comments
  ranges
  ranges_no_leases
  schema_changes
  session_trace
  session_variables
  table_columns
  table_indexes
  tables
  zones
(34 rows)

Time: 1.192ms

@bdarnell
Copy link
Contributor

My first choice is still to create a crdb_catalog table containing the subset of crdb_internal intended for external consumption (or slight variants like switching the ranges/ranges_no_leases pair), but if we don't do that I think it's better to document (parts of) crdb_internal at this point.

I partially withdraw my endorsement of SHOW statements - they're more work than virtual tables and not necessarily any easier to use (virtual tables are more consumable). Sometimes it may be worth it to have a SHOW statement as shorthand for a complex query on virtual tables, but I think virtual tables should be our primary interface here.

@jordanlewis
Copy link
Member

@awoods187, @bdarnell, since we're deciding to document crdb_internal, shouldn't that database be exposed like system?

@jseldess, it is exposed. The big difference between crdb_internal (and pg_catalog and information_schema) and system is that they are schemas, not databases. Every database contains each of the schemas I just mentioned, as well as public. This is important, because these virtual schemas can and will change depending on what database the user is currently connected to. To answer your question, you can see them with SHOW SCHEMAS.

+1 on the (parts of): we don't need to document all of crdb_internal at this point. I would look toward documenting just the parts that we consider stable. That way, it'll be somewhat more clear to users which parts of the schema are available for programmatic use and which might not be.

@jseldess
Copy link
Contributor

jseldess commented Aug 5, 2019

@awoods187, I don't think we'll have the bandwidth to do this for 19.2 unless we de-scope some of the other work happening on the SQL docs team. Are you ok pushing this to later, or is this more urgent than I understand?

@awoods187
Copy link
Contributor

I'm fine with this not making it but I do think we need to do #4978

@jseldess
Copy link
Contributor

jseldess commented Aug 6, 2019

Haven't we build that ability into SQL in 19.2?

@awoods187
Copy link
Contributor

We are building in show statements but those are porcelain and we don't recommend scripting based upon those. For scripting, you'd need to pull a portion of the crdb internal information.

@jseldess jseldess modified the milestones: 19.2, 20.1 Aug 6, 2019
@rmloveland
Copy link
Contributor

Some of the work of documenting crdb_internal from a reference-y POV is handled in PR #4326 (I assume we want to try to update and merge that in 19.2 timeline but IDK)

@ericharmeling
Copy link
Contributor

Eventually we'll want a page for crdb_internal like the page we have for information_schema: https://www.cockroachlabs.com/docs/v19.2/information-schema.html

But we can probably start with something pretty high-level for crdb_internal for 19.2 (at the same level in the TOC as the information schema page), as several 19.2 doc issues involve accessing crdb_internal:

We also reference crdb_internal in several place in live documentation (e.g. https://www.cockroachlabs.com/docs/v19.2/functions-and-operators.html#system-info-functions and https://www.cockroachlabs.com/docs/v19.2/show-jobs.html), without providing too much context. Adding this high-level page on crdb_internal would be helpful and clarifying.

rmloveland added a commit that referenced this issue Nov 4, 2019
Specifically, add docs for the new system tables that provide insight
into:

- Whether data is under-replicated or unavailable

- Violations of data placement requirements

- Which localities are critical

Fixes #4730.

Addresses #2957, #5636.

Relates to #4459.
rmloveland added a commit that referenced this issue Nov 8, 2019
Specifically, add docs for the new system tables that provide insight
into:

- Whether data is under-replicated or unavailable

- Violations of data placement requirements

- Which localities are critical

Fixes #4730.

Addresses #2957, #5636.

Relates to #4459.
rmloveland added a commit that referenced this issue Nov 11, 2019
Specifically, add docs for the new system tables that provide insight
into:

- Whether data is under-replicated or unavailable

- Violations of data placement requirements

- Which localities are critical

Fixes #4730.

Addresses #2957, #5636.

Relates to #4459.
@ericharmeling ericharmeling added P-2 Normal priority; secondary task and removed P-1 High priority; must be done this release labels Jul 15, 2020
@glennfawcett
Copy link

Documenting the crdb_internal tables would be helpful to avoid having to go the the ADMIN UI. I am sure there are many examples but, having the ability to run SQL instead of opening and authenticating a connection to a the admin UI is real helpful.

glenn@glenn-geolabs-0001:~$ ./cockroach node status --insecure
  id |         address          |       sql_address        |  build  |            started_at            |            updated_at            |                     locality                      | is_available | is_live
-----+--------------------------+--------------------------+---------+----------------------------------+----------------------------------+---------------------------------------------------+--------------+----------
   1 | glenn-geolabs-0001:26257 | glenn-geolabs-0001:26257 | v20.1.5 | 2020-08-31 20:01:11.250782+00:00 | 2020-08-31 20:06:53.404982+00:00 | cloud=gce,region=us-west1,zone=us-west1-b         | true         | true
   2 | glenn-geolabs-0002:26257 | glenn-geolabs-0002:26257 | v20.1.5 | 2020-08-31 20:01:13.018698+00:00 | 2020-08-31 20:06:50.668479+00:00 | cloud=gce,region=us-west1,zone=us-west1-b         | true         | true
   3 | glenn-geolabs-0003:26257 | glenn-geolabs-0003:26257 | v20.1.5 | 2020-08-31 20:01:14.841248+00:00 | 2020-08-31 20:06:52.486917+00:00 | cloud=gce,region=us-west1,zone=us-west1-b         | true         | true
   4 | glenn-geolabs-0004:26257 | glenn-geolabs-0004:26257 | v20.1.5 | 2020-08-31 20:01:18.906546+00:00 | 2020-08-31 20:06:52.088676+00:00 | cloud=gce,region=us-east4,zone=us-east4-b         | true         | true
   5 | glenn-geolabs-0005:26257 | glenn-geolabs-0005:26257 | v20.1.5 | 2020-08-31 20:01:22.531374+00:00 | 2020-08-31 20:06:51.214354+00:00 | cloud=gce,region=us-east4,zone=us-east4-b         | true         | true
   6 | glenn-geolabs-0006:26257 | glenn-geolabs-0006:26257 | v20.1.5 | 2020-08-31 20:01:25.991613+00:00 | 2020-08-31 20:06:50.228053+00:00 | cloud=gce,region=us-east4,zone=us-east4-b         | true         | true
   7 | glenn-geolabs-0007:26257 | glenn-geolabs-0007:26257 | v20.1.5 | 2020-08-31 20:01:32.449659+00:00 | 2020-08-31 20:06:52.176828+00:00 | cloud=gce,region=europe-west2,zone=europe-west2-a | true         | true
   8 | glenn-geolabs-0008:26257 | glenn-geolabs-0008:26257 | v20.1.5 | 2020-08-31 20:01:38.140406+00:00 | 2020-08-31 20:06:53.408267+00:00 | cloud=gce,region=europe-west2,zone=europe-west2-a | true         | true
   9 | glenn-geolabs-0009:26257 | glenn-geolabs-0009:26257 | v20.1.5 | 2020-08-31 20:01:43.838436+00:00 | 2020-08-31 20:06:50.114096+00:00 | cloud=gce,region=europe-west2,zone=europe-west2-a | true         | true
(9 rows)```

root@localhost:26257/system> select * from crdb_internal.gossip_nodes;
node_id | network | address | advertise_address | sql_network | sql_address | advertise_sql_address | attrs | locality | cluster_name | server_version | build_tag | started_at | is_live | ranges | leases
----------+---------+--------------------------+--------------------------+-------------+--------------------------+--------------------------+-------+---------------------------------------------------+--------------+----------------+-----------+----------------------------------+---------+--------+---------
1 | tcp | glenn-geolabs-0001:26257 | glenn-geolabs-0001:26257 | tcp | glenn-geolabs-0001:26257 | glenn-geolabs-0001:26257 | [] | cloud=gce,region=us-west1,zone=us-west1-b | | 20.1 | v20.1.5 | 2020-08-31 20:01:11.250782+00:00 | true | 16 | 4
2 | tcp | glenn-geolabs-0002:26257 | glenn-geolabs-0002:26257 | tcp | glenn-geolabs-0002:26257 | glenn-geolabs-0002:26257 | [] | cloud=gce,region=us-west1,zone=us-west1-b | | 20.1 | v20.1.5 | 2020-08-31 20:01:13.018698+00:00 | true | 17 | 4
3 | tcp | glenn-geolabs-0003:26257 | glenn-geolabs-0003:26257 | tcp | glenn-geolabs-0003:26257 | glenn-geolabs-0003:26257 | [] | cloud=gce,region=us-west1,zone=us-west1-b | | 20.1 | v20.1.5 | 2020-08-31 20:01:14.841248+00:00 | true | 15 | 4
4 | tcp | glenn-geolabs-0004:26257 | glenn-geolabs-0004:26257 | tcp | glenn-geolabs-0004:26257 | glenn-geolabs-0004:26257 | [] | cloud=gce,region=us-east4,zone=us-east4-b | | 20.1 | v20.1.5 | 2020-08-31 20:01:18.906546+00:00 | true | 18 | 3
5 | tcp | glenn-geolabs-0005:26257 | glenn-geolabs-0005:26257 | tcp | glenn-geolabs-0005:26257 | glenn-geolabs-0005:26257 | [] | cloud=gce,region=us-east4,zone=us-east4-b | | 20.1 | v20.1.5 | 2020-08-31 20:01:22.531374+00:00 | true | 19 | 4
6 | tcp | glenn-geolabs-0006:26257 | glenn-geolabs-0006:26257 | tcp | glenn-geolabs-0006:26257 | glenn-geolabs-0006:26257 | [] | cloud=gce,region=us-east4,zone=us-east4-b | | 20.1 | v20.1.5 | 2020-08-31 20:01:25.991613+00:00 | true | 19 | 4
7 | tcp | glenn-geolabs-0007:26257 | glenn-geolabs-0007:26257 | tcp | glenn-geolabs-0007:26257 | glenn-geolabs-0007:26257 | [] | cloud=gce,region=europe-west2,zone=europe-west2-a | | 20.1 | v20.1.5 | 2020-08-31 20:01:32.449659+00:00 | true | 18 | 3
8 | tcp | glenn-geolabs-0008:26257 | glenn-geolabs-0008:26257 | tcp | glenn-geolabs-0008:26257 | glenn-geolabs-0008:26257 | [] | cloud=gce,region=europe-west2,zone=europe-west2-a | | 20.1 | v20.1.5 | 2020-08-31 20:01:38.140406+00:00 | true | 18 | 3
9 | tcp | glenn-geolabs-0009:26257 | glenn-geolabs-0009:26257 | tcp | glenn-geolabs-0009:26257 | glenn-geolabs-0009:26257 | [] | cloud=gce,region=europe-west2,zone=europe-west2-a | | 20.1 | v20.1.5 | 2020-08-31 20:01:43.838436+00:00 | true | 18 | 3
(9 rows)

@glennfawcett
Copy link

When is this scheduled to be addressed?

@jseldess
Copy link
Contributor

@ericharmeling, @awoods187, any insights for @glennfawcett?

@ericharmeling
Copy link
Contributor

@glennfawcett Thanks for pinging on this. I opened a PR a while back to fill this gap in our docs (#5454), but never got it over the finish line. I'll update that PR within the week and get it back in review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-product P-2 Normal priority; secondary task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants