-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: expose table on-disk size #20712
Comments
@RaduBerinde and perhaps @tschottdorf do you mind giving an approximate time to do this? doesn't sound like it would take too long? SELECT .. FROM system.table_statistics makes sense to me. |
Adding a |
@petermattis makes sense to me, although my understanding was that we don't do much in terms of documenting crdb_internal. @jseldess correct me if I'm wrong? I think we are rapidly coming to the point where we have to consider exposing metrics in the admin UI also through either SQL or some other API for users trying to automate things based on metrics, so I'm a bit concerned about choosing one path here that we then have to change when we finally have time to think about how to expose these metrics in a way that is easily consumable by users in the format they want. What are your thoughts on that? |
That's was the basic agreement in the past, yes, @dianasaur323. @knz was involved in that decision. Could we add a SQL statement that queries that internal table? |
I'm in agreement that we should be able to access vs internal metrics via SQL. Adding SQL statements for each of these would likely result in an explosion of statements. I believe the hesitance in the past to document |
In this case, should we just do the crdb_internal virtual table approach
for this specific metric and revisit when more people start asking for
programmatic access to these metrics?
Diana
…On Thu, Dec 14, 2017 at 3:29 PM, Peter Mattis ***@***.***> wrote:
I think we are rapidly coming to the point where we have to consider
exposing metrics in the admin UI also through either SQL or some other API
for users trying to automate things based on metrics, so I'm a bit
concerned about choosing one path here that we then have to change when we
finally have time to think about how to expose these metrics in a way that
is easily consumable by users in the format they want. What are your
thoughts on that?
I'm in agreement that we should be able to access vs internal metrics via
SQL. Adding SQL statements for each of these would likely result in an
explosion of statements. I believe the hesitance in the past to document
crdb_internal was to give ourselves freedom to make backwards
incompatible changes. But we can either revisit that decision or design
another set of virtual tables for which we promise to maintain
compatibility.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20712 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFgDWJ98GfSydy0Qx24FFZgZnQQDAwEHks5tAYUtgaJpZM4RCNKm>
.
--
Diana Hsieh
[email protected]
407-690-9048
|
I think the virtual table makes sense for now. Querying this programmatically is generally a little dangerous too because the work done in computing these quantities isn't trivial, so we shouldn't exactly advertise it. But this also constrains us somewhat: if we make this a virtual table with strawman schema If that restriction makes things difficult, a more straightforward way is to make that a function (so that it can only be invoked for individual tableIDs). |
With a little bit of work we can plumb a filter to the virtual table
generator function so it only computes it for one table if there is a
`WHERE table_id = x`
…-Radu
On Thu, Dec 14, 2017 at 5:21 PM, Tobias Schottdorf ***@***.*** > wrote:
I think the virtual table makes sense for now. Querying this
programmatically is generally a little dangerous too because the work done
in computing these quantities isn't trivial, so we shouldn't exactly
advertise it.
But this also constrains us somewhat: if we make this a virtual table with
strawman schema (table_id, approx_size_bytes), will any query to this
table compute the stats for *all* tables? We should avoid that.
If that restriction makes things difficult, a more straightforward way is
to make that a function (so that it can only be invoked for individual
tableIDs).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20712 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/APxxeGuC0HCn_MW7fB4XOZrBGfJNvRgvks5tAZ9kgaJpZM4RCNKm>
.
|
@vivekmenezes do you mind seeing if anyone would have time to do this? it would be nice since it is a customer request. |
https://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE The entire page is a list of the various admin related functions that we could support. I'm in favor of us figuring out what we want to support and building a project around it rather than doing a one off feature request. |
@vivekmenezes I think we should leverage the flexibility we have in |
👍 on publicizing a set of vtables that we're willing to expose in docs and users. I would however use two separate namespaces though -- perhaps I'd suggest migrating most of those we have in Also I think it's OK to have some redundancy between the namespaces. |
I'd keep |
Just want to put a soft -1 on introducing another top-level virtual table namespace until we nail down what's going on with our schemas vs databases. UI tools show Perhaps we could use this moment to discuss the UX around adding a top-level database that contains several schemas for internal purposes. For example, the |
i agree that we should have that larger discussion - the things we expose to users is getting pretty spread out across different interfaces. that being said, i think it sounds like it's time for someone to take ownership of this and come up with a more complete proposal? |
Anyone against implementing
for this issue? |
+1 to that suggestion - I've seen several PG admin ui tools that use that builtin. |
The API call we have right now gives the approximate total replicated size of a table and it needs to fan out across the cluster and also scans from the meta ranges. Hooking this up to |
I imagine a system tracking table/database statistics and putting it in one place, and pg_table_size() looking up the stats table. |
it seems like this project is expanding a bit in scope. How long do you think this would take in terms of implementing, and is it still reasonable to get this done by 2.0? |
I doubt anyone's going to get to it anytime soon |
Let's do this in 2.1 then. Moving milestone. |
https://forum.cockroachlabs.com/t/index-size-on-disk/1519 is somewhat related. When evaluating CRDB (and comparing it with other databases) as well as getting a handle on primary key choices to best suit ones use cases it's really useful to know how the increase of table rows affects the growth of indexes. |
I need to a programmatic way to determine disk usage of each database as well. From what I can see this is still not resolved. I was originally looking for Any ETA on the DB size issue? |
is there an alternate way to estimate size from SQL? |
Zendesk ticket #5125 has been linked to this issue. |
@tbg isn't this possible (but slow) with something like:
|
@jordanlewis the query above does not provide any output on my v19.2.5 clusters. What version were you testing this with? |
on 19.2.5, the
|
From one of my production clusters: user@host:26257/defaultdb> select sum(range_size)/1000 from crdb_internal.ranges where database_name = 'defaultdb';
?column?
+----------+
NULL
(1 row)
Time: 31.687465ms Version: # cockroach version
Build Tag: v19.2.5
Build Time: 2020/03/16 18:27:12
Distribution: CCL
Platform: linux amd64 (x86_64-unknown-linux-gnu)
Go Version: go1.12.12
C Compiler: gcc 6.3.0
Build SHA-1: 4f36d0c62435596ca103454e113ebe8e55f005de
Build Type: release |
@charl could show
|
@robert-s-lee , gotcha. user@host:26257/oneconfig> select sum(range_size)/1000 from crdb_internal.ranges where database_name = 'oneconfig';
?column?
+-------------+
3697358.846
(1 row)
Time: 439.516065ms |
Note that anything based off Also anything based off So all in all, the solutions discussed so far are band-aids and no substitute for a properly designed feature able to report table sizes without such a large expense in resources. |
Hi there, any chance the PR will be merge? |
As of #20627, the admin UI computes the approximate on-disk size of tables. However, it makes sense to expose this information programmatically via SQL. All that's needed for this is to find a place to expose it, and populate it with from
(*adminServer).TableStats
. This call isn't extremely cheap (has fan-out and needs to scan a subset of the meta entries), but caching could be added as necessary.Perhaps there is a connection to be made with #20323. It seems reasonable that approximate on-disk size for a table would be exposed in
SELECT .. FROM system.table_statistics
. (cc @RaduBerinde).@dianasaur323 for triage.
Jira issue: CRDB-5915
Epic: CRDB-24527
The text was updated successfully, but these errors were encountered: