Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: hypertable_size is really expensive for large hypertables #6006

Closed
1 task done
mkindahl opened this issue Aug 24, 2023 · 2 comments · Fixed by #6463
Closed
1 task done

[Bug]: hypertable_size is really expensive for large hypertables #6006

mkindahl opened this issue Aug 24, 2023 · 2 comments · Fixed by #6463
Assignees
Labels

Comments

@mkindahl
Copy link
Contributor

mkindahl commented Aug 24, 2023

What type of bug is this?

Performance issue

What subsystems and features are affected?

Command processing

What happened?

When running hypertable_size on really large tables it can become excessively expensive. In particular for tables with a large number of chunks. In particular, when chunk sizes do not change for anything but the most recent chunks.

TimescaleDB version affected

2.12.0

PostgreSQL version used

15.3

What operating system did you use?

Ubuntu 22.04 x64

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

CREATE TABLE conditions(
    time timestamp with time zone NOT NULL,
    device_id INTEGER,
    temperature FLOAT,
    humidity FLOAT
);

SELECT * FROM create_hypertable('conditions', 'time', chunk_time_interval => '1 hour'::interval);

INSERT INTO conditions
SELECT time, (random()*3 + 1)::int, random()*80 - 40, random()*100
FROM generate_series(now() - INTERVAL '2 years', now(), '1 minute') AS time;

SELECT hypertable_name, hypertable_size(format('%I', hypertable_name)::regclass)
FROM timescaledb_information.hypertables;

Tasks

  1. auto-backport-not-done
    nikkhils
@mkindahl mkindahl added the bug label Aug 24, 2023
@mkindahl mkindahl assigned mkindahl and unassigned mkindahl Oct 25, 2023
@fabriziomello
Copy link
Contributor

fabriziomello commented Nov 22, 2023

@mkindahl not sure if we'll have a big gain rewriting it in C since the real problem is the underlying _timescaledb_functions.relation_size function calls that rely on the Postgres internal function pg_total_relation_size that read the size information by calling the stat system call. The idea of caching is good of course but how to update the cache?? When the chunks are populated?? Or an async job??

Other idea is implement a variation of this function to do an approximate calculation of the relations using the pg_class.relpages (more or less this POC - fabriziomello@3afbf90), but I found a problem that the relpages for the toast index is not updated. I've sent a message about it to pgsql-hackers but didn't received any answer yet.
In any case with the current situation I was able to tweak a bit Postgres to make this calculation a bit faster, have a look in the example below:

452071 (leader) fabrizio=# SELECT hypertable_name, hypertable_size(format('%I', hypertable_name)::regclass)
FROM timescaledb_information.hypertables;
 hypertable_name | hypertable_size 
-----------------+-----------------
 conditions      |      1004732416
(1 row)

Time: 857,838 ms
452071 (leader) fabrizio=# ALTER FUNCTION _timescaledb_functions.relation_size(REGCLASS) STABLE PARALLEL SAFE ROWS 1;
ALTER FUNCTION hypertable_size(REGCLASS) STABLE PARALLEL SAFE;
ALTER FUNCTION hypertable_detailed_size(REGCLASS) STABLE PARALLEL SAFE ROWS 1;
ALTER FUNCTION _timescaledb_functions.hypertable_local_size(name, name) STABLE PARALLEL SAFE ROWS 1;
ALTER FUNCTION
Time: 48,748 ms
ALTER FUNCTION
Time: 28,435 ms
ALTER FUNCTION
Time: 27,730 ms
ALTER FUNCTION
Time: 27,852 ms
452071 (leader) fabrizio=# SELECT hypertable_name, hypertable_size(format('%I', hypertable_name)::regclass)
FROM timescaledb_information.hypertables;
 hypertable_name | hypertable_size 
-----------------+-----------------
 conditions      |      1004732416
(1 row)

Time: 584,942 ms

This small changes decreased the total time from 857,838 ms to 584,942 ms ~32% performance improvement.

@nikkhils nikkhils self-assigned this Nov 23, 2023
@nikkhils
Copy link
Contributor

nikkhils commented Dec 22, 2023

PR for this #6463

@nikkhils nikkhils linked a pull request Dec 22, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants